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(54) Title: PROCESS FOR THE SELECTION OF HIV-1 SUBTYPE C ISOLATES, SELECTED HIV-1 SUBTYPE ISOLATES, 
THEIR GENES AND MODIFICATIONS AND DERIVATIVES THEREOF 



(57) Abstract: The invention provides a process for the selection of HIV-1 subtype (clade) C isolates, selected HIV- 1 subtype C 
^ isolates, their genes and modifications and derivatives thereof for use in prophylactic and therapeutic vaccines to produce proteins and 
>j polypeptides for the purpose of eliciting protection against HIV infection or disease. The process for the selection of HIV subtype 
S isolates comprises the steps of isolating viruses from recently infected subjects; generating a consensus sequence for at least part of at 
least one HIV gene by identifying the most common codon or amino acid among the isolated viruses; and selecting the isolated virus 
O or viruses with a high sequence identity to the consensus. sequence. -HTV-1 subtype C isolates, designated Du422, Du 151 and Du 
^ 179 (assigned Accession Numbers 010321 14, 00072724 and 00072725, respectively, by the European Collection of Cell Cultures) 
^ are also provided. 
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PROCESS FOR THE SELECTION OF HIV-1 SUBTYPE C ISOLATES. 
SELECTED H1V-1 SUBTYPE ISOLATES. THEIR GENES 
AND MODIFICATIONS AND DERIVATIVES THEREOF 

BACKGROUND TO THE INVENTION 

THIS invention relates to a process for the selection of HIV-1 subtype (clade) C 
isolates, selected HIV-1 subtype C isolates, their genes and modifications and 
derivatives thereof for use in prophylactic and therapeutic vaccines to produce proteins 
and polypeptides for the purpose of eliciting protection against HIV infection or 
disease. 

The disease acquired immunodeficiency syndrome (AIDS) is caused by human 
immunodeficiency virus (HIV). Over 34 million people worldwide are thought to be 
living with HIV/AIDS, with over 90% of infected people living in developing countries 
(UNAIDS, 1999). It is estimated that 24 million infected people reside in sub-Saharan 
Africa and that South Africa currently has one of the world's fastest growing HIV-1 
epidemics. At the end of 1999, over 22 % of pregnant women attending government 
antenatal clinics in South Africa were HIV positive (Department of Health, 2000). A 
preventative vaccine is considered to be the only feasible way to control this epidemic 
in the long term. 

HIV shows remarkable genetic diversity that has confounded the development of a 
vaccine. The molecular basis of variation resides in the viral enzyme reverse 
transcriptase which not only introduces an error every round of replication, but also 
promotes recombination between viral RNAs. Based on phylogenetic analysis of 
sequences, HIV has been classified into a number of groups: the M (major group) 
which comprises subtypes A to H and K, the O (outlier group) and the N (non-M, non-0 
group). Recently recombinant viruses have been more frequently identified and there 
are a number which have spread significantly and established epidemics (circulating 
recombinant forms or CRF) such as subtype A/G recombinant in West Africa, and CRF 
A/E recombinant in Thailand (Robertson et a/., 2000). 



CONFIRMATION C0P¥ 
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Subtype C predominates in the Southern African region which includes Botswana, 
Zimbabwe, Zambia, Malawi, Mozambique and South Africa. In addition, increasing 
numbers of subtype C infections are being detected in the Southern region of 
Tanzania. This subtype also predominates in Ethiopia and India and is becoming 
more important in China. 

A possible further obstacle to vaccine development is that the biological properties of 
HIV change as disease progresses. HIV requires two receptors to infect cells, the CD4 
and co-receptors of which CCR5 and CXCR4 are the major co-receptors used by HIV- 
1 strains. The most commonly transmitted phenotype is non-syncytium inducing (NSI), 
macrophage-tropic viruses that utilise the CCR5 co-receptor for entry (R5 viruses). 
Langerhans cells in the mucosa are thought to selectively pick up R5 variants at the 
portal of entry and transport them to the lymph nodes where they undergo replication 
and expansion. As the infection progresses, viruses evolve that have increased 
replicative capacity and the ability to grow in T cell lines. These syncytium-inducing (SI) 
T-tropic viruses use CXCR4 in conjunction with or in preference to CCR5, and in some 
cases also use other minor co-receptors (Connor et a/., 1997, Richman & Bozzette, 
1994). However HIV-1 subtype C viruses appear to be unusual in that they do not 
readily undergo this phenotypic switch, as R5 viruses are also predominant in patients 
with advanced AIDS (Bjorndal et a/., 1999, Peeters et a/., 1999, Ping et a/., 1999, 
Tscherning et a/., 1998, Scarlatti et a/., <1 997). 

SUMMARY OF THE INVENTION 

According to one aspect of the invention a process for the selection of HIV subtype 
isolates for use in the development of prophylactic and therapeutic pharmaceutical 
composition comprises the following steps: 

isolating viruses from recently infected subjects; 

generating a consensus sequence for at least part of at least one HIV gene by 
identifying the most common codon or amino acid among the isolated viruses 
at each position along at least part of the gene; and 
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selectlng the isolated virus or viruses with a high sequence identity to the 
consensus sequence, a phenotype which is associated with transmission for 
the particular HIV subtype. 

The isolated virus may be of the same subtype as a likely challenge strain. 

The HIV subtype is preferably HIV-1 subtype C. 

For HIV-1 subtype C, the phenotype which is associated with transmission is typically a 
virus that utilises the CCR5 co-receptor and is non syncitium inducing (NSI). 

According to another aspect of the invention an HIV-1 subtype C isolate, designated 
Du422 and assigned Provisional Accession Number 01032114 by the European 
Collection of Cell Cultures, is provided. 

According to another aspect of the invention an HIV-1 subtype C isolate, designated 
Du151 and assigned Accession Number 00072724 by the European Collection of Cell 
Cultures, is provided. 

According to another aspect of the invention an HIV-1 subtype C isolate, designated 
Du179 and assigned Accession Number 00072725 by the European Collection of Cell 
Cultures, is provided. 

According to another aspect of the invention a molecule is provided, the molecule 
having: 

(i) the nucleotide sequence set out in sequence as set out in Sequence 
I.D. No. 1; 

(ii) an RNA sequence corresponding to the nucleotide sequence set out in 
Sequence I.D. No. 1; 

(iii) a sequence which will hybridise to the nucleotide sequence set out in 
Sequence I.D. No. 1 or an RNA sequence corresponding to it, under 
strict hybridisation conditions; 
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(iv) a sequence which is homologous to the nucleotide sequence set out in 
Sequence I.D. No. 1 or an RNA sequence corresponding to it; or 

(v) a sequence which is a modification or derivative of the sequence of 
any one of (i) to (iv). 



The modified sequence is preferably that set out in Sequence l.D. No. 7. 

According to another aspect of the invention a molecule is provided, the molecule 
having: 

(i) the nucleotide sequence set out in Sequence l.D. No. 3; 

(ii) an RNA sequence corresponding to the nucleotide sequence set out in 
Sequence l.D. No. 3; 

(Hi) a sequence which will hybridise to the nucleotide sequence set out in 
Sequence l.D. No. 3 or an RNA sequence corresponding to it, under 
strict hybridisation conditions; 

(iv) a sequence which is homologous to the nucleotide sequence set out in 
Sequence l.D. No. 3 or an RNA sequence corresponding to it; or 

(v) a sequence which is a modification or derivative of the sequence of 
any one of (i) to (iv). 

The modified sequence is preferably that set out in Sequence l.D. No. 9. 



According to another aspect of the invention a molecule is provided, the molecule 

having: 

(i) the nucleotide sequence set out in Sequence l.D. No. 5; 

(ii) an RNA sequence corresponding to the nucleotide sequence set out in 
Sequence l.D. No. 5; 

(iii) a sequence which will hybridise to the nucleotide sequence set out in 
Sequence l.D. No. 5 or an RNA sequence corresponding to it, under 
strict hybridisation conditions; 

(iv) . a sequence which is homologous to the nucleotide sequence set out in 

Sequence l.D. No. 5 or an RNA sequence corresponding to it; or 
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(v) a sequence which is a modification or derivative of the sequence of 
any one of (i) to (iv). 

The modified sequence is preferably that set out in Sequence l.D. No. 11. 

According to another aspect of the invention a molecule is provided, the molecule 
having: 

(i) the nucleotide sequence set out in Sequence l.D. No. 13; 

(ii) an RNA sequence corresponding to the nucleotide sequence set out in 
Sequence l.D. No. 13; 

(iii) a sequence which will hybridise to the nucleotide sequence set out in 
Sequence l.D. No. 13 or an RNA sequence corresponding to it, under 
strict hybridisation conditions; 

(iv) a sequence which is homologous to the nucleotide sequence set out in 
Sequence l.D. No. 13 or an RNA sequence corresponding to it; or 

(v) a sequence which is a modification or derivative of the sequence of 
any one of (i) to (iv). 

The modified sequence preferably has similar or the same modifications as those set 
out in Sequence l.D. No. 11 for the env gene of the isolate Du151. 

According to another aspect of the invention a polypeptide is provided, the polypeptide 

having: 

(i) the amino acid sequence set out in Sequence l.D. No. 2; or 

(ii) a sequence which is a modification or derivative of the amino acid 
sequence set out in Sequence l.D. No. 2. 

The modified sequence is preferably that set out in Sequence l.D. No. 8. 

According to another aspect of the invention a polypeptide is provided, the polypeptide 

having: 

(i) the amino acid sequence set out in Sequence l.D. No. 4; or 
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(ii) a sequence which is a modification or derivative of the amino acid 
sequence set out in Sequence I.D. No. 4. 

The modified sequence is preferably that set out in Sequence I.D. No. 10. 

According to another aspect of the invention a polypeptide is provided, the polypeptide 
having: 

(i) the amino acid sequence set out in Sequence I.D. No. 6; or 

(ii) a sequence which is a modification or derivative of the amino acid 
sequence set out in Sequence I.D. No. 6. 

The modified sequence is preferably that set out in Sequence I.D. No. 12. 

According to another aspect of the invention a polypeptide is provided, the polypeptide 
having: 

(i) the amino acid sequence set out in Sequence I.D. No. 14; 

(ii) a sequence which is a modification or derivative of the amino acid 
sequence set out in Sequence I.D. No. 14. 

The modified sequence preferably has similar or the same modifications as those set 
out in Sequence I.D. No. 12 for the amino acid sequence of the env gene of the isolate 
Du151. 



According to another aspect of the invention a consensus amino acid sequence for the 
partial gag gene of HIV-1 subtype C is the following: 

GEKLDKWEKI RLRPGGKKHY MLKHLVWASR ELERFALNPG LLETSEGCKQ 50 
IMKQLQPALQ TGTEELRSLY NTVATLYCVH EKIEVRDTKE ALDKIEEEQN 100 
KSQQ-CQQKT QQAKAADGG- KVSQNYPIVQ NLQGQMVHQA ISPRTLNAWV 150 
EEKAFSP EVIPMFTALS EGATPQDLNT MLNTVGGHQA AMQMLKDTIN 200 

EEAAEWDRLH PVHAGPIAPG QMREPRGSDI AGTTSTLQEQ IAWMTSNPPI 250 
PVGDIYKRWI ILGLNKIVRM YSPVSILDIK QGPKEPFRDY VDRFFKTLRA 300 
EQATQDVKNW MTD 313 
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According to another aspect of the invention a consensus amino acid sequence for the 
partial pol gene of HIV-1 subtype C is the following: 

LTEEKIKALT AICEEMEKEG KITKIGPENP YNTPVFAIKK KDSTKWRKL- 50 

VDFRELNKRT QDFWEVQLGI PHPAGLKKKK SVTVLDVGDA YFSVPLDEGF 100 

RKYTAFTIPS INNETPGIRY QYNVLPQGWK GSPAIFQSSM TKILEPFRAK 150 

NPEIVIYQYM DDLYVGSDLE IGQHRAKIEE LREHLLKWGF TTPDKKHQKE 200 

PPFLWMGYEL HPDKWTVQPI QLPEKDSWTV NDIQKLVGKL NWASQIYPGI 250 
KVRQLCKLLR GAKALTDIVP LTEEAELE 270 

According to another aspect of the invention a consensus amino acid sequence for the 
partial env gene of HIV-1 subtype C is the following: 

YCAPAGYAIL KCNNKTFNGT GPCNNVSTVQ CTHGIKPWS TQLLLNGSLA 50 

EEEIIIRSEN LTNNAKTIIV HLNESVEIVC TRPNNNTRKS IRIGPGOTFY 100 

ATGDIIGDIR QAHCNISEGK WNKTLQKVKK KLKEELYKYK WEIKPLGIA 150 

PTEAKRRWE REKRAVGIGA VFLGFLGAAG STMGAASITL TVQARQLLSG 200 
IVQQQSNLLR A1EAQQHMLQ LTVWGIKQL 229 



DESCRIPTION OF THE DRAWINGS 



Figure 1 shows a schematic representation of the HIV-1 genome and 

illustrates the location of overlapping fragments that were 
sequenced having been generated by reverse transcriptase 
followed by polymerase chain reaction, in order to generate the 
South African consensus sequence; 

Figure 2 shows a phylogenetic tree of nucleic acid sequences of various 

HIV-1 subtype C isolates based on the (partial) sequences of the 
gag gene of the various isolates and includes a number of 
consensus sequences as well as the South African consensus 
sequence of the present Invention and a selected isolate, Du422, of 
the present invention: 
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Figure 3 shows a phylogenetic tree of nucleic acid sequences of various 

HIV-1 subtype C isolates based on. the (partial) sequences of the 
pol gene of the various isolates and includes a number of 
consensus sequences as well as the South African consensus 
sequence of the present invention and a selected isolate, Du151, of 
the present invention; 

Figure 4 shows a phylogenetic tree of nucleic acid sequences of various 

HIV-1 subtype C isolates based on the (partial) sequences of the 
env gene of the various isolates and includes a number of 
consensus sequences as well as the South African consensus 
sequence of the present invention and a selected isolate, Du151, of 
the present invention 



Figure 5 shows how the sequences of the gag genes of each of a number of 

isolates varies from the South African consensus sequence for the 
gag gene which was developed according to the present invention; 

Figure 6 shows how the sequences of the pol genes of each of a number of 

isolates varies from the South African consensus sequence for the 
pol gene which was developed according to the present invention; 

Figure 7 shows how the sequences of the env genes of each of a number of 

isolates varies from the South African consensus sequence for the 
env gene which was developed according to the present invention; 

Figure 8 shows a phylogenetic tree of amino acid sequences of various HIV- 

1 subtype C isolates based on the sequences of the (partial) gag 
gene of the various isolates and includes a number of consensus 
sequences as well as the South African consensus sequence of the 
present invention and a selected isolate, Du422, of the present 
invention; 
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shows a phylogenetic tree of amino acid sequences of various HIV- 
1 subtype C isolates based on the sequences of the (partial) pol 
gene of the various isolates and includes a Cpol consensus 
sequence as well as a South African consensus sequence of the 
present invention and a selected isolate, Du151, of the present 
invention; 

shows a phylogenetic tree of amino acid sequences of various HIV- 
1 subtype C isolates based on the sequences of the (partial) env 
gene of the various isolates and includes a Cenv consensus 
sequence as well as a South African consensus sequence of the 
present invention and a selected isolate, Du151, of the present 
invention; 

shows the percentage amino acid sequence identity of the 
sequenced gag genes of the various isolates in relation to one 
another, to the gag clone and to the South African consensus 
sequence for the gag gene and is based on a pairwise comparison 
of the gag genes of the isolates; 

shows the percentage amino acid sequence identity of the 
sequenced pol genes of the various isolates in relation to one 
another, to the pol clone and to the South African consensus 
sequence for the pol gene and is based on a pairwise comparison 
of the pol genes of the isolates; 

shows the percentage amino acid sequence identity of the 
sequenced env genes of the various isolates in relation to one 
another, to the env clone and to the South African consensus 
sequence for the env gene and is based on a pairwise comparison 
of the env genes of the isolates; 

shows a phylogenetic tree analysis of nucleic acid sequences of 
various HIV-1 subtype C isolates (or vaccine strains) based on the 
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complete sequences of the gag genes of the various isolates and 
shows the gag gene from a selected isolate, Du422, of the present 
invention compared to the other subtype C sequences; 

Figure 15 shows a phylogenetic tree analysis of nucleic acid sequences of 

various HIV-1 subtype C isolates (or vaccine strains) based on the 
complete sequences of the pol genes of the various isolates and 
shows the pol gene from a selected isolate, Du151, of the present 
invention compared to the other subtype C sequences; 

Figure 16 shows a phylogenetic tree analysis of nucleic acid sequences of 

various HIV-1 subtype C isolates (or vaccine strains) based on the 
complete sequences of the env gene of the various isolates and 
shows the env gene from a selected isolate, Du151, of the present 
invention compared to the other subtype C sequences; and 



LIST OF SEQUENCES 

Sequence I.D. No 1 shows the nucleic acid sequence (cDNA) of the sequenced 
gag gene of the isolate Du422; 

Sequence I.D. No 2 shows the amino acid sequence of the sequenced gag gene of 
the isolate Du422, derived from the nucleic acid sequence; 

Sequence I.D. No 3 shows the nucleic acid sequence (cDNA) of the sequenced pol 
gene of the isolate Du151; 

Sequence I.D. No 4 shows the amino acid sequence of the sequenced pol gene of 
the isolate Du151, derived from the nucleic acid sequence; 



Sequence I.D. No 5 



shows the nucleic acid sequence (cDNA) of the sequenced 
env gene of the isolate Du 1 51 ; 
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Sequence I.D. No 6 shows the amino acid sequence of the sequenced env gene of 
the isolate Du151, derived from the nucleic acid sequence; 

Sequence I.D. No 7 shows the nucleic acid sequence (DNA) of the resynthesized 
sequenced gag gene of the isolate Du422 modified to reflect 
human codon usage for the purposes of increased expression; 

Sequence I.D. No 8 shows the amino acid sequence of the resynthesized 
sequenced gag gene of the isolate Du422 modified to reflect 
human codon usage for the purposes of increased expression; 

Sequence I.D. No 9 shows the nucleic acid sequence (DNA) of the resynthesized 
sequenced pol gene of the isolate Du151 modified to reflect 
human codon usage for the purposes of increased expression; 

Sequence I.D. No 10 shows the amino acid sequence of the resynthesized 
sequenced pol gene of the isolate Du151 modified to reflect 
human codon usage for the purposes of increased expression; 

Sequence I.D. No 11 shows the nucleic acid sequence (DNA) of the resynthesized 
sequenced env gene of the isolate Du151 modified to reflect 
human codon usage for the purposes of increased expression; 

Sequence I.D. No 12 shows the amino acid sequence of the resynthesized 
sequenced env gene of the isolate Du151 modified to reflect 
human codon usage for the purposes of increased expression; 

Sequence I.D, No 13 shows the nucleic acid sequence (cDNA) of the sequenced 
env gene of the isolate Du179; and 

Sequence I.D. No 14 shows the amino acid sequence of the sequenced env gene of 
the isolate Du179. 
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DETA1LED DESCRIPTION OF THE INVENTION 

This invention relates to the selection of HIV-1 subtype isolates and the use of their 
genes and modifications and derivatives thereof in making prophylactic and 
therapeutic pharmaceutical compositions and formulations, and in particular vaccines 
against HIV-1 subtype C. The compositions could therefore be used either 
prophylactically to prevent infection or therapeutically to prevent or modify disease. A 
number of factors must be taken into consideration in the development of an HIV 
vaccine and one aspect of the present invention relates to a process for the selection 
of suitable HIV isolates for the development of a vaccine. 

The applicant envisages that the vaccine developed according to the above method 
could be used against one or more HIV subtypes other than HIV-1 subtype C. 

An HIV vaccine aims to elicit both a CD8+ cytotoxic T lymphocyte (CTL) immune 
response as well as a neutralizing antibody response. Many current vaccine 
approaches have primarily focused on inducing a CTL response. It is thought that the 
CTL response may be more important as it is associated with the initial control of viral 
replication after infection, as well as control of replication during disease, and is 
inversely correlated with disease progression (Koup et a/., 1994, Ogg et a/ M 1999 
Schmitz et a/., 1 999). The importance of CTL in protecting individuals from infection is 
demonstrated by their presence in highly exposed seronegative individuals such as 
sex-workers (Rowland-Jones et a/., 1998). 

Knowledge of genetic diversity is highly relevant to the design of vaccines aiming at 
eliciting a cytotoxic T-lymphocyte (CTL) response. There are many CTL epitopes in 
common between viruses, particularly in the gag and pol region of the genome (HIV 
Molecular Immunology Database, 1998). In addition, several studies have now shown 
that there is a cross-reactive CTL response: individuals vaccinated with a subtype B- 
based vaccine could lyse autologous targets infected with a diverse group of isolates 
(Ferrari et a/., 1997); and CTLs from non-B infected individuals could lyse subtype B- 
primed targets (Betts et a/. 1997; Durali et a/, 1998). A comparison of CTL epitopes in 
the HIV-1 sequence database shows about 40% of gp41 and 84% of p24 epitopes are 
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identical or have only one amino acid difference between subtypes. Although this is a 
very crude analysis and does not take into consideration populations or dominant 
responses to certain epitopes, it does however indicate that there is a greater 
conservation of cytotoxic T epitopes within a subtype compared to between subtypes 
and that there will be a greater chance of a CTL response if the challenge virus is the 
same subtype as the vaccine strain. 

The importance of genetic diversity in inducing a neutralizing antibody response 
appears to be less crucial. In general, neutralization serotypes are not related to 
genetic subtype. Some individuals elicit antibodies that can neutralize a broad range of 
viruses, including viruses of different subtypes while others fail to elicit effective 
neutralizing antibodies at al! (Wyatt and Sodroski, 1998; Kostrikis et a/., 1996; Moore 
et a/., 1996). As neutralizing antibodies are largely evoked against functional domains 
of the virus which are essentially conserved, it is probable that HIV-1 genetic diversity 
may not be relevant in producing a vaccine designed to elicit neutralizing antibodies. 

Viral strains used in the design of a vaccine need to be shown by genotypic analysis to 
be representative of the circulating strains and not an unusual or outlier strain. In 
addition, it is important that a vaccine strain also has the phenotype of a recently 
transmitted virus, which is NSl and uses the CCR5 co-receptor. 

A process was developed to identify appropriate strains for use in developing a 
vaccine for HIV-1 subtype C. Viral isolates from acutely infected individuals were 
collected. They were sequenced in the env, gag and pol regions and the amino acid 
sequences for the env, gag and pol genes from these isolates were compared, A 
consensus sequence, the South African consensus sequence, was then formed by 
selecting the most frequently appearing amino add at each position. The consensus 
sequence for each of the gag, pol and env genes of HIV-1 subtype C also forms an 
aspect of the invention. Appropriate strains for vaccine development were then 
selected from these isolates by comparing them with the consensus sequence and 
characterising them phenotypically. The isolates also form an aspect of the invention. 

In order to select for NSl strains which use the CCR5 co-receptor, a well established 
sex worker cohort was used to identify the appropriate strains. Appropriate strains 
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were identified from acutely infected individuals by comparing them with the consensus 
sequence which had been formed. Viral isolates from fifteen acutely infected 
individuals were sequenced in the env, gag and pol and phenotypically characterised. 
These sequences were compared with viral isolates from fifteen asymptomatic 
individuals from another region having more than 500 CD4 cells and other published 
subtype C sequences located in the Los Alamos Database ( http://ww.hiv-web.lanl.gov/). 

Three potential vaccine strains, designated Du151, Du422 and Du179, were selected. 
Du 151 and Du 422 were selected based on amino acid homology to the consensus 
sequence in all three gene regions env, gag and po/, CCR5 tropism and ability to grow 
and replicate in tissue culture. Du 179 is a R5X4 virus and was selected because the 
patient in which this strain was found showed a high level of neutralising antibodies. 
The nucleotide and amino acid sequences of the three gene regions of the three 
isolates and modifications and derivatives thereof also form aspects of the invention. 

The vaccines of the invention will be formulated in a number of different ways using a 
variety of different vectors. They involve encapsulating RNA or transcribed DNA 
sequences from the viruses in a variety of different vectors. The vaccines will contain 
at least part of the gag gene from the Du422 isolate, and at least part of the pol and 
env genes from the Du151 isolate of the present invention and/or at least part of the 
env gene from the Du179 isolate of the present invention or derivatives or 
modifications thereof. 

Genes for use in DNA vaccines have been resynthesized to reflect human codon 
usage. The gag Du422 gene was designed so that the myristylation site and inhibitory 
sequences were removed. Similarly resynthesized gp 160 (the complete env gene 
consisting of gp 120 and gp 41) and pol genes will be expressed by DNA vaccines. 
The gp160 gene sequence has also been changed as described above for the gag 
gene to reflect human codon usage and the rev responsive element removed. The 
protease, inactivated reverse transcriptase and start of the RNAse H genes from 
Du151 pol are optimised for increased expression and will be joined with gag at an 
inserted Bgl1 site. The gag-pol frameshift will be maintained to keep the natural 
balance of gag to pol protein expression. 
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Another vaccine will contain DNA transcribed from the RNA for the gag gene from the 
Du422 isolate and RNA from the pol and env genes from the Du151 isolate and/or 
RNA from the env gene from the Du179 isolate. These genes could also be expressed 
as oligomeric envelope glycoprotein complexes (Progenies, USA) as published in J 
Virol 2000 Jan;74(2):627-43 (Binley, J.L. et al.), the adeno associated virus (AAV) 
(Target Genetics) and the Venezuelan equine encephalitus virus (US patent 
application USSN 60/216,995, which is incorporated herein by reference). 

The isolation and selection of viral strains for the design of a vaccine 

The following criteria were used to select appropriate strains for inclusion into HIV-1 
vaccines for Southern Africa: 

that the strains be genotypically representative of circulating strains; 

that the strain not be an outlier strain; 

that the strain be as close as possible to the consensus amino acid sequence 
developed according to the invention for the env, gag and pol genes of HIV-1 
subtype C; 

that the strain have an R5 phenotype, i.e. a phenotype associated with 
transmission for selection of the RNA or cDNA to be included for the env 
region; and 

that the vaccine be able to be grown in tissue culture. 

The following procedure was followed in the selection of viral strains for the design of a 
vaccine. A well-established sex worker cohort in Kwazulu Natal, South Africa was 
used to identify the appropriate strains for use in an HIV vaccine. Viral isolates from 15 
acutely infected individuals were sequenced in env, gag and pol and were also isolated 
and phenotypicaily characterised. These sequences were compared with a similar 
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collection from asymptomatic individuals from the Gauteng region in South Africa as 
well as other published subtype C sequences. 

Patients 

Individuals with HIV infection were recruited from 4 regions in South Africa. Blood 
samples were obtained from recently infected sex workers from Kwazulu-Natal (n=13). 
Recent infection was defined as individuals who were previously seronegative and had 
became seropositive within the previous year. Samples were also collected from 
individuals attending out-patients clinics in Cape Town (n=2), women attending ante- 
natal clinics in Johannesburg (n=7) and men attending a STD clinic on a gold mine 
outside Johannesburg (n=8). The latter 2 groups were clinically stable and were 
classified as asymptomatic infections. Blood samples were collected in EDTA and 
used to determine the CD4 T cell count and genetic analysis of the virus. In the case 
of recent infections a branched chain (bDNA) assay (Chiron) to measure plasma viral 
load was done, and the virus was isolated. HlV-1 serostatus was determined by 
ELISA. The results of the CD4 T cell counts and the viral loads on the sex workers 
were established and information on the clinical status as at date of seroconversion, 
CD4, and data on the co-receptor usage is set out in Table 1 below. 



Virus isolation • 

HIV was isolated from peripheral blood mononuclear cells (PBMC) using standard co- 
culture techniques with mitogen-activated donor PBMC. 2x1 0 6 patient PBMC were co- 
cultured with 2x1 0 6 donor PBMC in 12 well plates with 2 ml RPMI 1640 with 20% FCS, 
antibiotics and 5% IL-2 (Boehringer). Cultures were replenished twice weekly with 
fresh medium containing IL-2 and once with 5x1 0 5 /ml donor PBMC. Virus growth was 
monitored weekly using a commercial p24 antigen assay (Coulter). Antigen positive 
cultures were expanded and cultured for a further 2 weeks to obtain 40 mis of virus 
containing supernatant which was stored at -70°C until use. The results of the 
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isolation of the viruses from the commercial sex workers is also shown in Table 1 
below. 

Viral phenotypes 

Virus-containing supernatant was used to assess the biological phenotype of viral 
isolates on MT-2 and co-receptor transfected cell lines. For the MT-2 assay, 500 ul of 
supernatant was incubated with 5x10 4 MT-2 cells in PRMI plus 10% FCS and 
antibiotics. Cultures were monitored daily for syncitia formation over 6 days. U87.CD4 
cell expressing either the CCR5 or CXCR4 co-receptor were grown in DM EM with 10% 
FCS, antibiotics, 500 ug/ml G418 and 1 ug/ml puromycin . GHOST cells expressing 
minor co-receptors were grown in DMEM with 10% FCS, 500 ug/ml G418, 1 ug/ml 
puromycin and 100 ug/ml hygromycin. Cell lines were passaged twice weekly by 
trypsination. Co-receptor assays were done in 12 well plates; 5x1 0 4 cells were plated 
in each well and allowed to adhere overnight. The following day 500ul of virus 
containing supernatant was added and incubated overnight to allow viral attachment 
and infection and washed three times the following day. Cultures were monitored on 
days 4, 8 and 12 for syncitia formation and p24 antigen production. Cultures that 
showed evidence of syncitia and increasing concentrations of p24 antigen were 
considered positive for viral growth. The results of co-receptor usage of the viruses 
from the commercial sex workers is also shown in Table 1. 
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Sequencing 

RNA was isolated from plasma and the gene fragments were amplified from RNA 
using reverse transcriptase to generate a cDNA followed by PCR to generate amplified 
DNA segments. The positions of the PCR primers are as follows, with the second of 
each primer pair being used as the reverse transcriptase primer in the cDNA synthesis 
step (numbering using the HIV-1 HXBr sequence): gag1 (790-813, 1282-1303), gag2 
(1232-1253 , 1797-1820), po/1 (2546-2573 , 3012-3041), po/2(2932-2957 , 3492- 
3515), envl (6815-6838, 7322-7349), env2 (7626-7653 , 7963-7986). The amplified 
DNA fragments were purified using the QIAQUICK PCR Purification Kit (Qiagen, 
Germany). The DNA fragments were then sequenced using the upstream PCR 
primers as sequencing primers. Sequencing was done using the Sanger 
dideoxyterminator strategy with fluorescent dyes attached to the dideoxynucleotides. 
The sequence determination was made by electrophoresis using an ABl 377 
Sequencer. A mapped illustration of an HIV-1 proviral genome showing the pol, gag 
and env regions sequenced as described above, is shown in Figure 1. The following 
regions were sequenced (numbering according to HXBr, Los Alamos database); 813 - 
1282 (gagl); 1253 - 1797 (gag2); 2583 - 3012 (poll); 2957-3515 (po!2); 6938 - 
7322 (envl); 7653 - 7963 (env2), as illustrated in Figure 1. 

Genotypic characterisation 

To select the vaccine isolate or isolates, a survey covering portions of the three major 
HIV genes gag (313 contiguous codons, 939 bases), pol (278 contiguous codons, 834 
bases) and env (229 codons in two noncontigous segments, 687 bases) was done 
(Figure 1). The map of Figure 1 shows the 5'long terminal repeat, the structural and 
functional genes (gag, pol and env) as well as the regulatory and accessory proteins 
(wf, tat, rev, nef, vpr and vpu). The gag open reading frame illustrates the regions 
encoding p17 matrix protein and the p24 core protein and the p7 and p6 nuclearcapsid 
proteins. The pol open reading frame illustrates the protease (PR) p15, reverse 
transcriptase (RT) p66 and the Rnase H integrase p51. The env open reading frame 
indicates the region coding for gp120 and the region coding for gp41 . 
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Of a total of 31 isolates, 14 were from the Durban cohort (DU), 15 were from 
Johannesburg (GG and RB) and 2 from Cape Town (CT). Of these 30 were 
sequenced in the gag region, 26 in the pol region and 27 in the env region. The 
isolates that were sequenced are shown in Table 2. 

TABLE 2 - LIST OF ISOLATES AND THE REGIONS GENES SEQUENCED 



Isolate 


Gag sequence 


Pol sequence 


sequence 


CTSC1 






- 


CTSC2 




V 


- 


DU115 






✓ 


DU123 




- 


<✓ 


DU151 


- 




✓ 


DU156 








DU172 








DU174 








DU179 


✓ 






DU204 






✓ 


DU258 






✓ 


DU281 








DU368 








DU422 


>/ 






DU457 


\/ 






DU467 


y/ 






GG1 


>/ 






GG10 








GG3 


✓ 






GG4 


V 






GG5 


>/ 


✓ 




GG6 








RB12 






✓ 


RB13 








RB14 


✓ 


✓ 


✓ 
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RB15 


✓ 






RB18 








RB21 


✓ 






RB22 


✓ 


<✓ 




RB27 






✓ 


RB28 


✓ 







The nucleic acid sequences from the Durban (DU) Johannesburg (GG, RB) and Cape 
Town (CT) cohorts were phylogenetically compared to all available published subtype 
C sequences (obtained from the Los Alamos HIV Sequence Database) including 
sequences from the other southern African countries and the overall subtype C 
consensus from the Los Alamos HIV sequence database. This comparison was done 
to ensure that the selected vaccine isolates were not phylogenetic outliers when 
compared to the Southern African sequences and the results of the comparison are 
shown in Figure 2, Figure 3 and Figure 4. Figures 2 to 4 illustrate that the sequences 
from Southern Africa are divergent and that the Indian sequences form a separate 
distinct cluster from these African sequences. The South African sequences are not 
unique and, in general, are as related to each other as they are to other sequences 
from Southern Africa. Overall this suggests Indian sequences are unique from 
Southern African subtype C sequences and that we do not have a clonal epidemic in 
South Africa, but rather South African viruses reflect the diversity of subtype C viruses 
in the Southern African region 

Determination of a consensus sequence 

Amino acid sequences were derived from the sequences shown in Table 2 and were 
used to determine a South African consensus sequence. The most frequently 
appearing amino acid at each position was selected as the consensus amino acid at 
that position. In this way, the consensus sequence was determined along the linear 
length of each of the sequenced gene fragments (gag, pol and env gene fragments). 
The alignments were done using the Genetics Computer Group (GCG) programs 
(Pileup and Pretty), which generates a consensus sequence in this manner. These 
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resulted in the consensus sequence for each gene region. The alignments of the 
amino acid sequences and the resulting consensus sequences are shown in Figures 5, 
6 and 7. 

The phylogenetic tree of amino acids showing a comparison of the South African 
sequences is set out in Figures 8, 9 and 10. The ES2 gag S, which is the sequence of 
the cloned Du422 gag gene, Du151 pol (clone number) 8, which is the sequence of the 
cloned Du151 pol gene, and Du151 env (clone number) 25, which is the sequence of 
the cloned Du151 env gene, are vaccine clones. It can be seen from Figures 8, 9 and 
10 that they are the same as the original isolates. These phylogenetic trees compare 
the relationship between the HIV proteins. South African isolates were compared with 
subtype A, B, C and D consensus sequences as well as with the South African 
consensus (Sagagcon) derived from the South African sequences, a Malawian 
consensus (Malgagcon) derived from Malawian sequences and overall consensuses 
(Cgagcon, Cpolcon and Cenvcon) derived from all subtype C sequences on the Los 
Alamos database. 

The final choice of which isolate or isolates to use was based on the similarity of the 
sequence of the gag, env and pol genes of a particular isolate to the South African 
consensus sequence which had been derived as set out above as well as the 
availability of an R5 isolate which had good replication kinetics as shown in Table 1. 

Selection of Vaccine Isolates 

Based on the considerations and methodology set out above, three strains were 
selected from the acute infection cohort as the vaccine strains. The first strain is Du 
422 for the gag gene, the second strain is Du151 for the pol and env genes and the 
third strain is Du179 which is a possible alternative for the env gene. These three 
strains were selected for the following reasons. 

1. At the time the samples were obtained, Du151 had been infected for 6 weeks 
and had a CD4 count of 367 cells per ul of blood and a viral load above 
500,000 copies per ml of plasma. Given the high viral load, and the recorded 
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time from infection, it is probable that the individual was still in the initial 
stages of viraemia prior to control of HIV replication by the immune system. 

2. At the time the samples were obtained. Du422 had been infected for 4 months 

with a CD4 count of 397 cells per ul of blood and a viral load of 17,118 copies 

per ml of plasma. In contrast to Du151, this individual had already brought 

viral replication under control to a certain extent. 
* 

3. At the time the samples were obtained, Du179 had been infected for 21 
months with a CD4 count of 394 ceils per ul of blood and a viral load of 1,359 
copies per ml of plasma. 

Based on the analysis of the phylogenetic tree shown in Figure 8 showing the 
relationship between full length gp120 sequence and other isolates, and the amino 
acid pairwise comparison shown in Figure 11, the Du422 gag sequence was shown to 
be most similar to the South African consensus sequence shown in Figures 2 and 5. It 
shared 98% amino acid sequence identity with the consensus sequence. In addition, 
the average pairwise distance, which is the percentage difference between the DNA 
sequences, between the DU422 gag sequence and the other sequences from the 
seroconverters was the highest of any sequence derived from this cohort, at 93.5%, 
and nearly as high as the average distance of the isolates to the SA consensus 
sequence (94.2%). The Du422 gag gene was cloned and the specific clone gave 
values very similar to the original isolate: having a pairwise identity value with the SA 
consensus of (98%) and nearly as high an average identity value with the other 
isolates as the DU422 isolate (93.3%). Thus, both the original DU422 isolate 
sequence and the generated clone had the highest pairwise percentage similarity to 
other isolates with the minimal values all being above 90%. 

The poi sequences showed the highest values for the pairwise comparisons.. Based 
on the analysis of the phylogenetic tree shown in Figure 9 and the pairwise identity 
score with the SA consensus (98.9%) shown in Figure 12, we chose the DU151 isolate 
as the source of the pol gene. Other contributing factors in this decision were that this 
is the same isolate that was chosen for the source of the env gene and that this was an 
isolate with excellent growth properties in vitro. The actual pof gene clone from the 
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DU151 isolate was somewhat more divergent from the SA consensus sequence 
(97.8%), and had a smaller average identity score when compared to the other isolates 
(95.1%). However, we judged the small increase in distance from the consensus not 
to be significant in this otherwise well conserved H1V-1 gene and therefore chose the 
DU151 pol gene for further development. Only one of the recent seroconverter 
sequences was less than 93% identical with the DIM 51 pol gene segment. 

The env gene showed the greatest sequence diversity. Based on the analysis of the 
phylogenetic tree shown in Figure 10, we chose the DU151 env gene. The DU151 env 
gene segment shows an average pairwise comparison score with the other isolates of 
87.2%, with the clone being slightly higher (87.9%). The DU151 isolate gene segment 
has a pairwise identity score of 92.6% with the SA consensus while the DU151 clone is 
at 91.3%. Finally, ail pairwise identity scores are above 83% with either the DU151 
isolate sequence or the clone when compared to the other recent seroconverters, as 
shown in Figure 13. These pairwise scores make the DU151 sequence similar to the 
best scores in this sequence pool and combine these levels of similarity with an R5 
virus with good cell culture replication kinetics. 

The clones representing the full length gene for each of the above viral genes were 
generated by PCR, Viral DNA present in cells infected with the individual isolates were 
used for the pol and env clones, and DNA derived directly from plasma by RT-PCR 
was used for the gag clone. Total DNA was extracted from the infected cell pellets 
using the QIAGEN DNeasy Tissue Kit. This DNA was used in PCR reactions using the 
following primers (HXBR numbering, Los Alamos database) in a nested PCR 
amplification strategy: 

gag: outer,623-640, and 2391-2408. inner, 789-810 and 2330-2350 

pol: outer.2050 -2073, and 5119-5148. inner.2085 -2108, and 5068-5094. 

env. outer, 6195-6218, and 8807-8830. inner, 6225-6245, and 8758-8795. 

The PCR products were biunt-end cloned into pT7Blue using the Novagen pT7Blue 
Blunt Kit. The inserts were characterized by doing colony PCR to identify clones with 
gene inserts. The identity of the insert was confirmed by sequencing the insert on both 
strands and comparing this sequence to the original sequence. 
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IVlodification of clones 

Several modifications were introduced to the cloned genes, as shown in Figures 23 to 
28. In order to increase levels of expression of proteins, the DNA sequence was 

resynthesized and the following modifications were made; 

the codon usage was changed to reflect human codon usage for 
increased expression; and 

the inhibitory and rev responsive elements were also removed. 

The modifications to the gag gene sequence of Du422 are shown in Sequence I.D. 
numbers 7 and 8. 

Also for the DNA, modified vaccinia ankara (MVA) and BCG vaccines, the pol gene 
was truncated so that only the protease, reverse transcriptase and RNAse H regions of 
the pol gene will be expressed. In addition, the active site amino acid motive YMDD 
has been mutated to YMAA so that the expressed reverse transcriptase will be 
catalytically inactive. The modifications to the pol gene of Du151 are shown in 
sequence I.D. numbers 9 and 10. 

Synthetic genes 

The complete gag and env genes were resynthesized to optimise the codons for 
expression in human cells, also shown in Sequence I.D. numbers 9 to 12. During this 
process the inhibitory sequences (INS) and rev responsive elements (RRE) are 
removed which has reported to result in increased expression. The gag gene 
myristylation signal was mutated as described above and as shown in Sequence I.D. 
numbers 7 and 8. 

The following material has been deposited with the European Collection of Cell 
Cultures, Centre for Applied Microbiology and Research, Salisbury, Wiltshire SP4 
OJG, United Kingdom (ECACC). 
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Material 



HIV-1 Viral isolate Du151 



HIV-1 Viral isolate Du179 



ECACC Deposit No. 

Accession Number 
00072724 

Accession Number 
00072725 



Deposit Date 
27 July 2000 

27 July 2000 



HIV-1 Viral isolate Du422 



Provisional Accession 
Number 00072726 
Provisional Accession 
Number 01032114 



27 July 2000 
22 March 2001 



The deposit was made under the provisions of the Budapest Treaty on the 
International Recognition of the Deposit of Microorganisms for the Purpose of Patent 
Procedure and regulations thereunder (Budapest Treaty). 
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other biological material referred to 
in the description on: 

page 


J 




line 


1j 
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Identification of DeDosit 




3-3-1 


Name of depositary institution 


17iiv*Ar\aan 1 ar< 4- i An r\"F Pol 1 Pnl "+"11T*<=»Q 

CiujbOpean uoixecuion ui vexx v^uxtujceo 


3-3-2 


Address of depositary institution 


Vaccine Research and Production 
Laboratory, Public Health Laboratory 
Service, Centre for Applied Microbiology 
ana Kesearcn, iroruon uown, oansoury , 
Wiltshire SP4 0JG, United Kingdom 


3-3-3 


Date of deposit 


22 March 2001 (22.03.2001) 


3-3-4 


Accession Number 


ECACC 01032114 


3-4 


Additional Indications 


NONE 


3-5 


Designated States for Which 
Indications are Made 


all designated States 


3-6 


Separate Furnishing of Indications 

These indications will be submitted to 
the International Bureau later 


NONE 


4 
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The indications made below relate to 
the deposited microorganlsm(s) or 
other biological material referred to 
in the description on: 
page 


26 




line 


11 










Mamn nf HpnnQitflrv Institution 


• 

European Collection of Cell Cultures 


4-3-2 


Address of depositary institution 


Vaccine Research and Production 
Laboratory, Public Health Laboratory 
Service, Centre for Applied Microbiology 
and Research , Porton Down , Salisbury , 
Wiltshire SP4 0JG, United Kingdom 


4-3-3 


Date of deposit 


27 July 2000 (27.07.2000) 


4-3-4 


Accession Number 


ECACC 00072726 


4-4 


Additional Indications 


NONE 


4-5 


Designated States for Which 
Indications are Made 


all designated States 


4-6 


Separate Furnishing of Indications 

These indications will be submitted to 
the International Bureau later 


NONE 



WO 02/04494 PCT/IBO 1/0 1208 

32 

PCT PA128340/PCT 
Original (for SUBMISSION) - printed on 06.07.2001 04:52:24 PM 

FOR RECEIVING OFFICE USE ONLY 



0-4 


This form was received with the 
international application: 

(yes or no) 


ye3 


0-4-1 


Authorized officer 




FOR INTERNATIONAL BUREAU USE ONLY 


0-5 


This form was received by the 
international Bureau on: 




0-5-1 


Authorized officer 





WO 02/04494 



33 



PCT7IB01/01208 



CLAIMS 

1. A process for the selection of HIV subtype isolates for use in the development • 
of a prophylactic and/or therapeutic pharmaceutical composition comprising 
the following steps: 

isolating viruses from recently infected subjects; 

generating a consensus sequence for at least part of at least one HIV gene by 
identifying the most common codon or amino acid among the isolated viruses 
at each position along at least part of the gene; 

selecting the isolated virus or viruses with a high sequence identity to the 
consensus sequence, a phenotype which is associated with transmission for 
the particular HIV subtype. 

2. A process according to claim 1, wherein the isolated virus is of the same 
subtype as a likely challenge strain. 

3. A process according to either of claims 1 or 2, wherein the HIV subtype is HIV- 
1 subtype C. 

4. A process according to claim 3, wherein the phenotype which is associated 
with transmission is a virus that utilises the CCR5 co-receptor and is non 
syncytium inducing (NSI). 

5. An HIV-1 subtype C isolate, designated Du422 and assigned Provisional 
Accession Number 010321 14 by the European Collection of Cell Cultures. 

6. An HIV-1 subtype C isolate, designated Du151 and assigned Accession 
Number. 00072724 by the European Collection of Cell Cultures. 

7. An HIV-1 subtype C isolate, designated Du179 and assigned Accession 
Number. 00072725 by the European Collection of Cell Cultures. 



8. 



A molecule having: 

(i) the nucleotide sequence set out in Sequence I.D. No 1 ; 
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(ii) an RNA sequence corresponding to the nucleotide sequence set out in 
Sequence I.D. No 1; 

(iii) a sequence which will. hybridise to the nucleotide sequence set out in 
Sequence I.D. No 1 or an RNA sequence corresponding to it, under 
strict hybridisation conditions; 

(iv) a sequence which is homologous to the nucleotide sequence set out in 
Sequence I.D. No 1 or an RNA sequence corresponding to it; or 

(v) a sequence which is a modification or derivative of the sequence of 
any one of (i) to (iv). 

9. A molecule according to claim 8, which has the modified sequence set out in 
Sequence I.D. No 7. 

10. A molecule having: 

(i) the nucleotide sequence set out in Sequence I.D. No 3; 
(it) an RNA sequence corresponding to the nucleotide sequence set out in 
Sequence I.D. No 3; 

(iii) a sequence which will hybridise to the nucleotide sequence set out in 
Sequence I.D. No 3 or an RNA sequence corresponding to it, under 
strict hybridisation conditions; 

(iv) a sequence which is homologous to the nucleotide sequence set out in 
Sequence I.D. No. 3 or an RNA sequence corresponding to it; or 

(v) a sequence which is a modification or derivative of the sequence of 
any one of (i) to (iv). 

11. A molecule according to claim 10, which has the modified sequence set out in 
Sequence I.D. No. 9. 

12. A molecule having: 

(i) the nucleotide sequence set out in Sequence I.D. No. 5; 

(ii) an RNA sequence corresponding to the nucleotide sequence set out in 
Sequence I.D. No. 5; 
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(iii) a sequence which will hybridise to the nucleotide sequence set out in 
Sequence I.D. No. 5 or an RNA sequence corresponding to it, under 
strict hybridisation conditions; 

(iv) a sequence which is homologous to the nucleotide sequence set out in 
Sequence I.D. No. 5 or an RNA sequence corresponding to it; or 

(v) a sequence which is a modification or derivative of the sequence of 
any one of (i) to (iv). 

13. A molecule according to claim 12, which has the modified sequence set out in 
Sequence I.D. No. 11. 

14. A molecule having: 

(i) the nucleotide sequence set out in Sequence I.D. No. 13; 

(ii) an RNA sequence corresponding to the nucleotide sequence set out in 
Sequence I.D. No. 13; 

(iii) a sequence which will hybridise to the nucleotide sequence set out in 
Sequence I.D. No. 13 or an RNA sequence corresponding to it, under 
strict hybridisation conditions; 

(iv) a sequence which is homologous to the nucleotide sequence set out in 
Sequence I.D. No. 13 or an RNA sequence corresponding to it; or 

(v) a sequence which is a modification or derivative of the sequence of 
any one of (i) to (iv). 

15. A molecule according to claim 14, which has a modified sequence which has 
similar or the same modifications as those set out in Sequence I.D. No. 11 for 
the env gene of the isolate Du151 . 

16. A polypeptide having: 

(i) the amino acid sequence set out in Sequence I.D. No. 2; or 

(ii) a sequence which is a modification or derivative of the amino acid 
sequence set out in Sequence I.D. No. 2. 

17. A polypeptide according to claim 16, wherein the modified sequence is set out 
in Sequence I.D. No. 8. 
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1 8. A polypeptide having: 

(i) the amino acid sequence set out in Sequence I.D. No. 4; or 

(ii) a sequence which is a modification or derivative of the amino acid 
sequence set out in Sequence I.D. No. 4. 

19. A polypeptide according to claim 18, wherein the modified sequence is that set 
out in Sequence I.D. No. 10. 

20. A polypeptide having: 

(i) the amino acid sequence set out in Sequence I.D. No. 6; or 

(ii) a sequence which is a modification or derivative of the amino acid 
sequence set out in Sequence I.D. No. 6. 

21. A polypeptide according to claim 20, wherein the modified sequence is that set 
out in Sequence I.D. No. 12. 

22. A polypeptide having: 

(i) the amino acid sequence set out in Sequence I.D. No. 14; 

(ii) a sequence which is a modification or derivative of the amino acid 
sequence set out in Sequence I.D. No. 14. 

23. A polypeptide according to claim 22, wherein the modified sequence has 
similar or the same modifications as those set out in Sequence I.D. No. 12 for 
the amino acid sequence of the env gene of the isolate Du1 51 . 

24. A consensus amino acid sequence for the partial gag gene of HIV-1 subtype C 
which is: 



GEKLDKWEKI 

IMKQLQPALQ 

KSQQ-CQQKT 

EEKAFSP 

EEAAEWDRLH 



RLRPGGKKHY 

TGTEELRSLY 

QQAKAADGG- 

EVIPMFTALS 

PVHAGPIAPG 



MLKHLVWASR 

NTVATLYCVH 

KVSQNYPIVQ 

EGATPQDLNT 

QMREPRGSDI 



ELERFALNPG 

EKIEVRDTKE 

NLQGQMVHQA 

MLNTVGGHQA 

AGTTSTLQEQ 



LLETSEGCKQ 50 
ALDKIEEEQN 100 
ISPRTLNAWV 150 
AMQMLKDTIN 200 
IAWMTSNPPI 250 
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PVGDIYKRWI ILGLNKIVRM YSPVSILD1K QGPKEPFRDY VDRFFKTLRA 
EQATQDVKNW MTD 313 

25. A consensus amino acid sequence for the partial pol gene of HIV-1 subtype C 
which is: 

LTEEKIKALT AICEEMEKEG KITKIGPENP 

VDFRELNKRT QDFWEVQLGl PHPAGLKKKK 

RKYTAFTIPS INNETPGIRY QYNVLPQGWK 

NPEIVIYQYM DDLYVGSDLE IGQHRAKIEE 

PPFLWMGYEL HPDKWTVQP! QLPEKDSWTV 

KVRQLCKLLR GAKALTDIVP LTEEAELE 278 

26. A consensus amino acid sequence for the partial env gene of HIV-1 subtype C 
which is: 

YCAPAGYAIL KCNNKTFNGT GPCNNVSTVQ CTHGIKPWS TQLLLNGSLA 50 
EEEHIRSEN LTNNAKTIIV HLNESVEIVC TRPNNNTRKS IRIGPGOTFY t0 ° 
ATGDIIGDIR QAHCNISEGK WNKTLQKVKK KLKEELYKYK WEIKPLGIA 150 
PTEAKRRWE REKRAVGIGA VFLGFLGAAG STMGAASITL TVQARQLLSG 200 
IVQQQSNLLR AfEAQQHMLQ LTVWGIKQL 229 

27. A process according to claim 1 , substantantially as herein described. 

28. An HIV-1 subtype C isolate according to claim 5, substantially as herein 
described. 

29. An HIV-1 subtype C isolate according to claim 6, substantially as herein 
described. 

30. An HIV-1 subtype C isolate according to claim 7, substantially as herein 
described. 



YNTPVFAIKK 

SVTVLDVGDA 

GSPAIFQSSM 

LREHLLKWGF 

NDIQKLVGKL 



KDSTKWRKL- 
YFSVPLDEGF 1 
TKILEPFRAK 1i 
TTPDKKHQKE : 
NWASQIYPGI 21 



31. 



A molecule according to claim 8, substantially as herein described. 
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32. A molecule according to claim- 10, substantially as herein described. 

33. A molecule according to claim 12, substantially as herein described. 

34. A molecule according to claim 14, substantially as herein described. 

35. A polypeptide according to claim 16, substantially as herein described. 

36. A polypeptide according to claim 18, substantially as herein described. 

37. A polypeptide according to claim 20, substantially as herein described. 

38. A polypeptide according to claim 22, substantially as herein described. 

39. A consensus amino acid sequence according to claim 24, substantially as herein 

described. 

40. A consensus amino acid sequence according to claim 25, substantially as herein 
described. 



described. 
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FIGURE 4 
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FIGURE 5 
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SEQUENCE LISTING 



SEQUENCE I.D. No 7: Du422 synthesised gag gene 



1 GGGGGATGTGCTGCAAGGCGATTAAGTTGGGTAACGCCAGGGTTTTCCCAATCACGACGT 
61 TGTAAAACGACAGCCAATGAATTGAAGCTTATGGCTGCTCGCGCATCTATCCTCAGAGGC 
1 2 1 GAAAAGTTGGATAAGTGGGAAAAAATCAGACTCAGGCCAGGAGGTAAAAAACACTACATG 
181 CTGAAGCATATCGTGTGGGCATCTAGGGAGTTGGAGAGATTTGCACTGAACCCCGGACTG 

2 4 1 CTGGAAACCTCAGAGGGCTGTAAGCAAATCATGAAACAGCTCCAACCAGCCTTGCAGACC 
301 GGAACAGAAGAGCTGAAGTCCCTTTACAATACCGTGGCAACCCTCTATTGCGTCCACGAG 
361 AAGATCGAGGTGAGAGACACAAAGGAGGCCCTGGACAAAATCGAGGAGGAGCAGAATAAG 
421 T GCC AG C AG AAG AC CC AGC AG G C AAAGGC TGCTGACGG AAAG GT CT CT C AG AACT ATCCT 
481 ATCGTTCAGAACCTTCAGGGGCAGATGGTGCACCAAGCAATCAGCCCTAGAACCCTGAAC 
541 GCATGGGTGAAGGTGATCGAGGAGAAAGCCTTTTCTCCCGAGGTTATCCCCATGTTTACC 
601 GCCCTGAGCGAAGGCGCCACTCCTCAAGACCTGAACACTATGCTGAACACAGTGGGAGGA 
661 CACCAGGCCGCTATGCAGATGTTGAAGGATACCATCAACGAGGAGGCAGCCGAATGGGAC 
721 CGCCTCCACCCCGTGCACGCCGGACCTATCGCCCCCGGACAAATGAGAGAACCTCGCGGA 
781 AGTGATATTGCCGGTACTACCAGCACCCTTCAAG AGCAGATTGCTTGGATGACCAGCAAC 
841 CCACCCATCCCAGTGGGCGATATTTACAAAAGGTGGATTATTCTGGGGCTGAACAAAATT 
901 GTGAGAATGTACTCCCCCGTCTCCATCCTCGACATCCGCCAAGGACCCAAGGAGCCTTTT 
961 AGGGATT ACGTGGACAG ATTCTTCAAAACCCTTAG AGCTGAGCAAGCCACTCAGGAGGTT 
1021 AAGAACTGG ATGACAGATACTCTGCTCGTGCAAAACGCTAACCCCG ATTGCAAAACC ATC 
1081 TTGAGAGCTCTCGGTCCAGGTGCCACCCTTGAGGAAATGATGACAGCATGTCAAGGCGTG 
1141 GGAGGACCTGGGCACAAGGCCAGAGTTCTCGCTGAGGCCATGAGCCAGACAAACTCAGGC 
1201 AAT ATCATGATGCAGAGGAGTAACTTTAAGGGTCCCAGGAG AATCGTCAAGTGCTTCAAT 
1261 TGTGGCAAGGAGGGTCACATTGCCAGGAACTGCCGCGCCCCCAGGAAGAAAGGCTGCTGG 
1321 AAGTGTGGCAAAGAGGGCCACCAGATGAAGGATTGCACCGAGCGCCAAGCAAACTTCCTG 
1381 GGAAAGATTTGGCCCAGTCATAAGGGCCGCCCTGGCAACTTCCTTCAAAACAGACCCGAG 
1441 CCTACCGCCCCCCCCGCTGAGTCTTTCAGATTTGAGGAGACCACCCCCGCTCCAAAGCAG 
1501 GAGCCAATTGAGAGAGAGCCTCTCACCAGTCTCAAAAGCCTCTTTGGTAGCGACCCCCTC 
1561 AGCCAATAAGAATTCTAGCTTGGCGTAATCATGGTCATAGCTGTTTCCTGTGTGAAATTG 
1621 TTATCAGCTCACAATTCCACACAACATACGAGCCGGAAGCATAAAGTGTAAAGCCTGGGA 
1681 TGCCTAATGAGTGAGCTAACTCACATTAGTTGCGTTGCGCTCACTGCCCGCTTTCCAGTC 
1741 GGGAAACCTGTCGTGCCAGCTCCATTAGTGAATCGTCCAACGCACGGGGAGAGGCGGTTT 
1801 GCGTATTGGGCGCACTTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGTTCGTTCGGCT 
1861 GCGGCGAGCCGTATCAGCTCACTCAAAGGCGGTAATACGGTTATC 

SEQUENCE I.D. No 8: Du422 synthesised Gag Protein 

1 GGGGGATGTGCTGCAAGGCGATTAAGTTGGGTAACGCCAGGGTTTTCCCAATCACGACGT 

1 GGCAARRLSWVT PG FSQSRR 

61 TGTAAAACGACAGCCAATGAATTGAAGCTTATGGCTGCTCGCGCATCTATCCTCAGAGGC 

21 CKTTANELKLMAARAS I LRG 

121 GAAAAGTTGGAT AAGTGGGAAAAAATCAGACTCAGGCCAGGAGGTAAAAAACACTACATG 

41 EKLDKWE KIRLRPGGKKHYM 

181 CTGAAGCATATCGTGTGGGCATCTAGGGAGTTGGAGAGATTTGCACTGAACCCCGGACTG 

61 LKHIVWASRELERFALNPGL 

241 CTGGAAACCTCAGAGGGCTGTAAGCAAATCATGAAACAGCTCCAACCAGCCTTGCAGACC 

81 LETSEGCKQIMKQLQPALQT 

3 0 1 GG AACAG AAG AGCTGAAGTCCCTTTACAATACCGTGGCAACCCTCTAtTGCGTCCACGAG 

101 GTEELKSLYHTVATLYCVHE 

3 61 AAGATCGAGGTGAGAGACACAAAGGAGGCCCTGGACAAAATCGAGGAGGAGCAGAATAAG 
121 KIEVRDTKEALDKI EEEQNK 
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4 2 1 TGCCAGCAGAAGACCCAGCAGGCAAAGGCTGCTGACGGAAAGGTCTCTCAGAACTATCCT 

14 1 CQQKTQQAKAADGKV3QNYP 

481 ATCGTTCAGAACCTTCAGGGGCAGATGGTGCACCAAGCAATC.-.GCCCTAGAACCCTGAAC 

161 IVQNLQGQMVHQAI 5 PRTLN 

541 GCATGGGTGAAGGTGATCGAGGAGAAAGCCTTTTCTCCCGAGGTTATCCCCATGTTTACC 

181 AWVKVIEEKAFSPEVI PMFT 

601 GCCCTGAGCGAAGGCGCCACTCCTCAAGACCTGAACACTATGCTGAACACAGTGGGAGGA 

201 ALS.EGATPQDLNTNLNTVGG 

661 CACCAGGCCGCTATGCAGATGTTGAAGGATACCATCAACGAGGAGGCAGCCGAATGGGAC 

221 HQAAMQMLKDTINE2AAEWD 

1 2 1 CGCCTCCACCCCGTGCACGCCGGACCTATCGCCCCCGGACAAATGAGAGAACCTCGCGGA 

241 RLHPVHAGPIAPGQMREPRG 

7 9 .1 AGTGATATTGCCGGTACTACCAGCACCCTTCAAGAGCAGATTGCTTGGATGACCAGCAAC 

261 SDIAGTTSTLQEQIAWMTSN 

841 CCACCCATCCCAGTGGGCGATATTTACAAAAGGTGGATTATTCTGGGGCTGAACAAAATT 

281 PPIPVGDIYKRWIILGLNKI 

901 GTGAGAATGTACTCCCCCGTCTCCATCCTCGACATCCGCCAAGGACCCAAGGAGCCTTTT 

301 VRMYSPVS ILDIRQGPKEPF 

961 AGGGATTACGTGGACAGATTCTTCAAAACCCTT AGAGCTGAGCAAGCCACTCAGGAGGTT 

321 RDYVDRFFKTLRAEQATQEV 

1021 AAGAACTGGATGACAGATACTCTGCTCGTGCAAAACGCTAACCCCGATTGCAAAACCATC 

341 KNWMTDTLLVQNAN PDCKTI 

1081 TTGAG AGCTCTCGGTCCAGGTGCCACCCTTGAGG AAATGATG ACAGCATGTCAAGGCGTG 

361 LRALGPGATLE EMMTACQGV 

1141 GGAGGACCTGGGCACAAGGCCAGAGTTCTCGCTGAGGCCATGAGCCAGACAAACTCAGGC 

381 GGPGHKARVLAEAMSQTNSG 

1201 AATATCATGATGCAG AGGAGT AACTTT AAGGGTCCCAGGAG AATCGTCAAGTGCTTCAAT 

401 NIMMQRSN FKG PRR IVKCFN 

12 61 TGTGGCAAGGAGGGTCACATTGCCAGGAACTGCCGCGCCCCCAGGAAGAAAGGCTGCTGG 

421 CGKEGHIARNCRAPRKKGCW 

1321 AAGTGTGGCAAAGAGGGCCACCAGATGAAGGATTGCACCGAGCGCCAAGCAAACTTCCTG 

441 KCGKEGHQMKDCTERQANFL 

1381 GGAAAG ATTTGGCCC AGTCATAAGGGCCGCCCTGGCAACTTCCTTCAAAACAGACCCGAG 

4 61 GKIWPSHKGRPGNFLQNRPE 

14 41 CCTACCGCCCCCCCCGCTGAGTCTTTCAGATTTGAGGAGACCACCCCCGCTCCAAAGCAG 

481 PTAPPAES FRFEETTPAPKQ 

1501 G AGCCAATTG AG AGAGAGCCTCTCACCAGTCTCAAAAGCCTCTTTGGTAGCGACCCCCTC 

501 EPI EREPLTSLKSLFGSDPL 

1561 AGCCAATAAGAATTCTAGCTTGGCGT7^ATCATGGTCATAGCTGTTTCCTGTGTGAAATTG 

521 SQ*EF*LGVXMVIAVSCVKL 

1621 TT ATCAGCTCACAATTCCACACAACATACGAGCCGGAAGCATA7\AGTGTAAAGCCTGGGA 

541 LSAHNSTQHTSRKHKV + SLG 
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1681 TGCCTAATGAGTGAGCTAACTCACATTAGTTGCGTTGCGCTCACTGCCCGCTTTCCAGTC 

561 CLMSELTH I SCVALTAR FPV 

1741 GGGAAACCTGTCGTGCCAGCTCCATTAGTGAATCGTCCAACGCACGGGG AGAGGCGGTTT 

581 GKPVVPAPLVNRPTHGERRF 

1801 GCGTATTGGGCGC ACTTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGTTCGTTCGGCT 

601 AYWAHFRFLAH*LAALVRSA 

18 61 GCGGCGAGCCGTATCAGCTCACTCAAAGGCGGTAATACGGTTATC 

621 AASRI SSLKGGNTVI 



SEQUENCE I.D. No 9: Dul51 synthesised pol gene 

1 TCGCGCGTTT CGGTGATGAC GGTGAAAACC TCTGACACAT GCAGCTCCCG 

51 GAGACGGTCA CAGCTTGTCT GTAAGCGGAT GCCGGGAGCA GACAAGCCCG 

101 TCAGGGCGCG TCAGCGGGTG TTGGCGGGTG TCGGGGCTGG CTTAACTATG 

151 CGGCATCAGA GCAGATTGTA CTGAGAGTGC ACCATATGCG GTGTGAAATA 

201 CCGCACAGAT GCGTAAGGAG AAAATACCGC ATCAGGCGCC ATTCGCCATT 

251 CAGGCTGCGC AACTGTTGGG AAGGGCGATC GGTGCGGGCC TCTTCGCTAT 

301 TACGCCAGCT GGCGAAAGGG GGATGTGCTG CAAGGCGATT AAGTTGGGTA 

351 ACGCCAGGGT TTTCCCAGTC ACGACGTTGT AAAACGACGG CCAGTGCCAA 

4 01 GCTTGCATGC CTGCAGGTCG ACTCTAGAGG ATCCCCGGGT ACCGAGCTCC 

Bgll (join to Gag for Gag-pol) 



4 51 TTCCCACAAG GGCCGGCCAG GCAATTTCCT TCAGAACAGA CCAGAGCCAA 

501 CAGCCCCACC AGCAGAGAGC TTCAGGTTCG AAGAGACAAC CCCCGCTCCG 

551 AAACAGGAGC CGAGAGAAAG GGAACCCTTA ACTTCCCTCA AATCACTCTT 

601 TGGCAGCGAC CCCTTGTCTC AATAAAAATC GGCGGCCAGA CCCGGGAGGC 

651 CCTGCTGGAC ACCGGCGCCG ACGACACCGT GCTGGAGGAC ATCAACCTGC 

701 CCGGCAAGTG GAAGCCCAAG ATGATCGGCG GCATCGGCGG CTTCATCAAG 

7 51 GTGCGGCAGT ACGACCAGAT CCTGATCGAG ATCTGCGGCA AGAAGGCCAT 

801 CGGCACCGTG CTGGTGGGCC CCACCCCCGT GAACATCATC GGCCGGAACA 

851 TGCTGACCCA GCTGGGCTGC ACCCTGAACT TCCCCATCAG CCCCATCGAG 

901 ACCGTGCCCG TGAAGCTGAA GCCCGGCATG GACGGCCCCA AGGTGAAGCA 

951 GTGGCCCCTG ACCGAGGTGA AGATCAAGGC CCTGACCGCC ATCTGCGAGG 

1001 AGATGGAGAA GGAGGGCAAG ATCACCAAGA TCGGCCCCGA GAACCCCTAC 

1051 AACACCCCCA TCTTCGCCAT CAAGAAGGAG GACAGCACCA AGTGGCGGAA 

1101 GCTGGTGGAC TTCCGGGAGC TGAACAAGCG GACCCAGGAC TTCTGGGAGG 

1151 TGCAGCTGGG CATCCCCCAC CCCGCCGGCC TGAAGAAGAA GAAGAGCGTG 

1201 ACCGTGCTGG ACGTGGGCGA CGCCTACTTC AGCGTGCCCC TGGACGAGGG 

12 51 CTTCCGGAAG TACACCGCCT TCACCATCCC CAGCATCAAC AACGAGACCC 

1301 CCGGCATCCG GTACCAGTAC AACGTGCTGC CCCAGGGCTG GAAGGGCAGC 

1351 CCCGCCATCT TCCAGGCCAG CATGACCAAG ATCCTGGAGC CCTTCCGGGC 

14 01 CAAGAACCCC GAGATCGTGA TCTACCAGTA CATGGCCGCC CTGTACGTGG 

14 51 GCAGCGACCT GGAGATCGGC CAGCACCGGG CCAAGATCGA GGAGCTGCGG 

1501 GAGCACCTGC TGAAGTGGGG CTTCACCACC CCCGACAAGA AGCACCAGAA 

1551 GGAGCCCCCC TTCCTGTGGA TGGGCTACGA GCTGCACCCC GACAAGTGGA 

1601 CCGTGCAGCC CATCCAGCTG CCCGAGAAGG ACAGCTGGAC CGTGAACGAC 

1651 ATCCAGAAGC TGGTGGGCAA GCTGAACTGG ACCAGCCAGA TCTACCCCGG 

17 01 CATCAAGGTG CGGCAGCTGT GCAAGCTGCT GCGGGGCACC AAGGCCCTGA 

17 51 CCGACATCGT GCCCCTGACC GAGGAGGCCG AGCTGGAGCT GGCCGAGAAC 

1801 CGGGAGATCC TGAAGGAGCC CGTGCACGGC GTGTACTACG ACCCCAGCAA 

} 8 51 GGACCTGATC GCCGAGATCC AGAAGCAGGG CGACGACCAG TGGACCTACC 
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1901 AGATCTACCA GGAGCCCTTC AAGAACCTGA AAACCGC-CAA GTACGCCAAG 

1951 CGGCGGACCA CCCACACCAA CGACGTGAAG CAGCTGACCG AGGCCGTGCA 

2001 GAAGATCAGC CTGGAGAGCA TCGTGACCTG GGGCAAG.-.CC CCCAAGTTCC 

2051 GGCTGCCCAT CCAGAAGGAG ACCTGGGAGA TCTGGTG 3.-.C CCACTACTGG 

2101 CAGGCCACCT GGATCCCCGA GTGGGAGTTC GTGAACATCC CCCCCCTGGT 

2151 . GAAGCTGTGG TACCAGCTGG AGAAGGAGCC CATCGCC3GC GCCGAGACCT 

2 201 TCTACGTGGA CGGCGCCGCC AACCGGGAGA CCAAGAT T3G CAAGGCCGGC 

2251 TACGTGACCG ACCGGGGCCG GCAGAAGATC GTGACCC73A GCGAGACCAC 

2301 CAACCAGAAA ACCGAGCTGC AGGCCATCCA GCTGGCCCTG CAGGACAGCG 

2351 AGAGCGAGGT GAACATCGTG ACCGACAGCC AGTACGC TCT GGGCATCATC 

2401 CAGGCCCAGC CCGACCGGAG CGAGAGCGAG CTGGTGA.--CC AGATCATCGA 

2451 GCAGCTGATC AAGAAGGAGC GGGCCTACCT GAGCTGG3TG CCCGCCCACA 

2501 AGGGCATCGG CGGCGACGAG CAGGTGGACA AGCTGGTGAG CAGCGGCATC 

2551 CGGAAGGTGC TGTGATCTAG AGAATTC 



SEQUENCE I.D. No 10: Dul51 synthesised Pol Protein 

1 SRVSVMTVKT SDTCSSRRRS QLVCKRMPGA DKPVRARQRV 1AGVGAGLTM RHQSRLY*EC 

61 TICGVKYRTD A*GENTASGA IRHSGCATVG KGDRCGPLRY YASWRKG DVL QGD*VG*RQG 

121 FPSHDVVKRR PVPSLHACRS TLEDPRVPSS FPQGPARQFP 3EQTRANSPT SRELQVRRDN 

161 PRSETGAERK GTLNFPQITL WQRPLVSIKI GGQTREALLD TGADDTVLED INLPGKWKPK 

241 MIGGIGGFIK VRQYDQILIE ICGKKAIGTV LVGPTPVNII 3RNMLTQLGC TLNFPISPIE 

301 TVPVKLKPGM DGPKVKQWPL TEVKIKALTA ICEEMEKEGK ITKIGPENPY NTPIFAIKKE 

361 DSTKWRKLVD FRELNKRTQD FWEVQLGIPH PAGLKKKKSV TVLDVGDAYF SVPLDEGFRK 

421 YTAFTIPSIN NETPGIRYQY NVLPQGWKGS PAI FQASMTK ILEPFRAKNP EIVIYQYMAA 

4 81 LYVGSDLEIG QHRAKIEELR EHLLKWGFTT PDKKHQKEPP FLWMGYELHP DKWTVQPIQL 

541 PEKDSWTVND IQKLVGKLNW TSQIYPGIKV RQLCKLLRGT KALTDIVPLT EEAELELAEN 

601 REILKEPVHG VYYDPSKDLI AEIQKQGDDQ WTYQIYQEPF KNLKTGKYAK RRTTHTNDVK 

661 QLTEAVQKIS LESIVTWGKT PKFRLPIQKE TWEIWWTDYW QATWIPEWEF VNTPPLVKLW 

721 YQLEKEPIAG AETFYVDGAA NRETKIGKAG YVTDRGRQKI 7TLSETTNQK TELQAIQLAL 

7 81 QDSESEVNIV TDSQYALGII QAQPDRSESE LVNQIIEQLI KKERAYLSWV PAHKGIGGDE 

841 QVDKLVSSGI RKVL* 



SEQUENCE I.D. No 11: Dul51 synthesised env Gene 

1 AAGCTTATGA GGGTTATGGG GATTCAGAGA AACTGGCCTC AGTGGTGGAT TTGGGGGACA 

61 TTGGGATTTT GGATGATCAT CATCTGTCGC GTCGTGGGCA ACCTGAACCT GTGGGTCACT 

121 GTCTACTATG GAGTGCCAGT TTGGAAGGAA GCCAAGACAA CTCTGTTTTG CGCCAGCGAC 

181 GCCAAGGCTT ATGACAAGGA AGTCCACAAC GTGTGGGCCA CCCACGCATG TGTCCCAACC 

241 GACCCCAACC CACGCGAAAT CGTGCTGGAA AACGTCACAG AAAATTTCAA CATGTGGAAA 

301 AACGATATGG TGGATCAGAT G CAT GAG GAT ATTATTAGCC TCT GGGACCA GTCTCTGAAG 

361 CCATGTGTGA AGTTGACACC TCTCTGTGTG ACCCTTAACT GTACTAACGC CCCCGCCTAT 

4 21 AACAACTCTA TGCACGGGGA GATGAAAAAC TGTTCCTTCA ACACCACCAC CGAAATCAGG 

4 81 GACAGAAAAC AGAAAGCCTA TGCCCTGTTC TATAAGCCCG ATGTGGTGCC ACTTAACCGC 

541 CGCGAAGAAA ATAATGGTAC TGGCGAATAT ATTCTGATTA AC7GTAACAG CTCTACAATT 

601 ACTCAGGCTT GCCCTAAAGT CACCTTTGAC CCAATCCCAA TCCACTACTG CGCCCCTGCA 

661 GGATACGCTA TCCTGAAATG CAATAATAAG ACCTTCAACG GAACTGGACC CTGCAATAAC 

721 GTGTCTACAG TGCAATGTAC CCACGGCATT ATGCCCGTCG TCTCCACCCA ACTGCTGCTC 

7 81 AATGGCAGCT TGGCAGAAGA GGAGATCATT ATTAGGAGCG AAAACCTCAC CAACAATATC 

841 AAGACAATCA TCGTGCACCT GAACAAGTCT GTGGAAATTG TGTGTACCAG GCCCAATAAC 

901 AACACCAGGA AGAGCATCCG CATCGGACCT GGACAAACTT TCTACGCCAC CGGCGAAATC 

961 ATCGGGAACA TTAGAGAAGC CCACTGCAAC ATCTCTAAGA GCAATTGGAC ATCTACATTG 

1021 GAGCAAGTGA AAAAAAAGCT GAAAGAGCAC TACAATAAGA CCATCGAGTT CAACCCTCCT 

1081 TCCGGCGGCG ATCTGGAGGT CACAACACAC TCCTTTAACT GTAGGGGGGA GTTCTTTTAC 

1141 TGCAACACAA CAAAGCTGTT TAGCAACAAC TCCGACAGCA ATAATGAGAC TATCACCCTG 

1201 CCTTGCAAGA TCAAGCAAAT CATTAACATG TGGCAGAAAG TGGGAAGGGC AATGTATGCA 

1261 CCTCCCATCG AGGGCAACAT CACATGCAAG TCTAATATCA CCGGCCTGTT GCTGACTAGA 

1321 GACGGTGGCA AGAATACTAC TAACGAAATC TTCAGGCCAG GTGGAGGGAA CATGAAAGAT 
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1381 AATTGGCGCT CCGAACTGTA TAAGTACAAG 

14 41 CCCACAAAGT CTAAGCGCCG CGTGGTGGAA 

1501 GTGCTGCTGG GGTTCTTGGG TGCCGCTGGG 

1561 ACCGTGCAAG CTAGGCAGCT GCTGTCCGGT 

1621 GCTATCGAGG CCCAGCAGCA TATGCTGCAA 

1681 ACTCGCGTCC TGGCAATCGA ACGCTACCTG 

17 41 TGCTCCGGTA AGATCATCTG TACCACAGCC 
1801 AGCCAAGAGG ATATTTGGGA TAATATGACC 

18 61 TACACAGGAA CC ATTTATAG GCTCCTGGAA 
1921 AAGGACTTGC TCGCCCTGGA TAGCTGGAAA 
1981 TGGCTTTGGT ACATTAAGAT TTTCATCATG 
2041 ATCTTCGGGG TGCTTGCCAT TGTGAAAAGG 
2101 CAGACCTTGA CTCCAAGCCC ACGCGGACCC 
2161 GGCGAACAGG ATAAGGACCG CTCCATCAGA 
2 221 GATGATCTGA GGAGCCTGTG CCTCTTCTCC 
2 281 GCAGCTAGGG CTGCTGAGTT GCTGGGACGC 
2 341 GAGGCACTGA AGTACCTCGG GAACCTTGTG 
24 01 GCCATCAAGC TGTTCGACAC CATCGCAATC 
24 61 GAGGTCATTC AGAGGATCTG TCGCGCCATC 
2521 TTCGAGGCAG CACTGCAATG ATAGTTAATT 



GTGGTGGAGA TTGAGCCCCT CGGCGTCGCC 
AGAGAGAAGA GGGTTGTCGG CCTCGGCGCA 
TCTACAATGG GCG TTGCCTC TATTACACTC 
ATTGTGCAAC AAC-.GAGCAA TCTCTTGAGA 
CTTACAGTGT GGG3TATTAA GCAGCTGCAA 
AAAGACCAGC AACTCCTGGG TCTGTGGGGC 
GTGCCCTGGA ACA3CAGCTG GTCCAATAAG 
TGGATGCAAT GGGATAGAGA GATCAGCAAC 
GATTCTCAGA ACC.-.GCAGGA GAAGAACGAG 
AACCTGTGGA ATT 3GTTTAA CATCACCAAC 
ATTGTGGGAG GCT7GATCGG CCTGAGGATT 
GTCAGACAAG GATACTCCCC ATTGTCCTTT 
GACAGGTTGG GCA3GATCGA GGAGGAAGGA 
CTTGTTAGCG GGTTTCTGGC CCTGGCCTGG 
TATCACCACC TCC GCGATTT CATCCTCATT 
TCCTCCCTGA GAGGTCTCCA GAGAGGCTGG 
CAATACGGCG GGCTGGAGCT GAAAAGATCC 
GCCGTTGCAG AGG3CACCGA CAGGATCTTG 
CGCCACATCC CCATCAGGAT CAGACAAGGA 
AAACGCGTGG ATCT 



SEQUENCE I.D. No 12: DulSl synthesised Env Protein 

KLMRVMGIQRNWPQWWIWGTLGFWMIIICRVVGNLNLWVTVYYGVPVWKEAKTTLFCASD 
AKAYDKEVHNVWATHACVPTDPNPREIVLENVTENFNMWKMDMVDQMHEDIISLWDQSLK 
PCVKLTPLCVTLNCTNAPAYNNSMHGEMKNCSFNTTTEIRDRKQKAYALFYKPDWPLNR 
REENNGTGEYILINCNSSTITQACPKVTFDPIPIHYCAPAGYAILKCNiiKTFNGTGPCNN 
VSTVQCTHGIMPVVSTQLLLNGSLAEEEIIIRSENLTNNIKTIIVHLNKSVEIVCTRPNN 
NTRKSIRIGPGQTFYATGEI IGNIREAHCNISKSNWTSTLEQVKKKLKEHYNKTIEFNPP 
SGGDLEVTTHSFNCRGEFFYCNTTKLFSNNSDSNNETITLPCKIKQIIl:MWQKVGRAMYA 
PPIEGNITCKSNITGLLLTRDGGKNTTNEIFRPGGGNMKDNWRSELYK": KVVEIEPLGVA 
PTKSKRRVVEREKRAVGLGAVLLGFLGAAGSTMGAASITLTVQARQLLCGIVQQQSNLLR 
AIEAQQHMLQLTVWGIKQLQTRVLAIERYLKDQQLLGLWGCSGKIICT7AVPWNSSWSNK 
SQEDIWDNMTWMQWDREISNYTGTIYRLLEDSQNQQEKNEKDLLALDSiTKNLWNWFNITN 
WLWYIKIFIMIVGGLIGLRIIFGVLAIVKRVRQGYSPLSFQTLTPSPRGPDRLGRIEEEG 
GEQDKDRSIRLVSGFLALAWDDLRSLCLFSYHHLRDFILIAARAAELLGRSSLRGLQRGW 
EALKYLGNLVQYGGLELKRSAIKLFDTIAIAVAEGTDRILEVIQRICRAIRHIPIRIRQG 
FEAALQOOLI KRVD* 

SEQUENCE I.D. No 13: Dul79 Env Gene (non -humanised) 

AGGCTAATTTTTTAGGGAAAATTTGGCCTTCCCACAAGGGGAGGCCAGGGAATTTCCTTCAGAGCAGGCCAATGAGAGT 
GAGGGGGATACAGAGGAATTGGCCACAATGGTGGATATGGGGCATCTTAGGCTTTTGGATGTTAATGATTTGTAGTGGG 
GTGGGAAACTTGTGGGTCACAATCTATTATGGGGTACCTGTGTGGAGAGAAGCAAAAACTACTCTATTCTGTGCATCAG 
ATGCTAAAGCATATGATAGAGAAGTGCATAATGTCTGGGCTACACATGCCTGTGTACCCACAGACCCCAACCCACAAGA 
AATAGTTATGGGAAATGTAACAGAAAATTTTAACATGTGGAAAAATGACATGGTGGATCAGATGCATGAGGATATAATC 
AATTTATGGGATCAAAGCCTAAAGCCATGTGTAAAGTTAACCCCACTCTGTGTCACTTTAAAATGTAGTACCTATAATG 
GTAGTGATACCAACGATATGAGAAATTGCTCTTTCAATACAACTACAGAAATAAGGGACAAGAAACAGACAGTGTATGC 
ACTTTTTTATAAACCTGATATAGTACCAATTAATGAGAGTGAGTATATATTAATACATTGCAATACCTCAACCATAACA 
CAAGCCTGTCCAAAGGTCTCTTTTGACCCAATTCCTATACATTATTGTGCTCCAGCTGGTTATGCGATTCTAAAGTGTA 
ATAATAAGACATTCAATGGGACGGGACCATGCCAAAATGTCAGCACAGTACAATGCACACATGGAATTAAGCCAGTAGT 
ATCAACTCAACTACTGTTAAATGGTAGCATAGCAGAAGGAGAGATAATAATTAGATCTGAAAATCTGACAAACAATGTT 
AAAACAATAATAGTACACCTTAATGAATCTATAGGAATTGTGTGTACAAGACCCGGCAATAATACAAGAAAAAGTATAA 
GGATAGGACCAGGACAAGCATTCTATACAAATCACATAATAGGAGATATAAGACAAGCATATTGT/^ACATTAGTAAACA 
AGAATGGAACAAAACTTTAGAAGAGGTGAGAAAAAAATTGCAAGAACACTTCCCAAATAAAACAATAAAATTTAACTCA 
TCCTCAGGAGGGGACCTAGAAATTACAACACATAGCTTTAATTGCAGAGGAGAATTTTTCTATTGCAATACATCAAAAC 
TATTTAATGATAGTCTAGTAAATGATACAGAAAGTAATTCAACCATCACTATTCCATGCAGAATAAAACAAATTATAAA 
CATGTGGCAGGAGGTAGGACGAGCAATGTATGCCCCTCCCATTGCAGGAAACATAACATGTAAATCAAATATCACAGGA 
CTACTATTGACACGTGATGGAGGAACAGAT.AACACAACAGAGATATTCAGACCTGGAGGAGGAAATATGAAGGACAATT 
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GGAGAAGTGAATTATATAAATATAAAGTAGTAGAAATTAAGCCATTGGGAATAGCACCCACTGAAGCAAAAAGGAGAGT 
GGTGGAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTGTGCTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGC 
GCGGCGTCAATAACGCTGACGGTACAGGCCAGACAACTGTTGTCTGGTATAGTGCAACAGCAAAGCAATTTGCTGAGAG 
CTATAGAGGCGCAACAGCATATGTTGCAACTCACAGTCTGGGGCATTAAGCAGCTCCAGACAAGAGTCCTGGCTATAGA 
AAGATACCTAAAGGATCAACAGCTCCTAGGACTTTGGGGCTGCTCTGGAAAACTCATCTGCACCACTAATGTGCCTTGG 
AACTCCAGTTGGAGCAATAAATCTCAACAAGCTATTTGGGATAACATGACATGGATGCAGTGGGATAGAGAAATTAATA 
ATTACACAAACATAATATACCAGTTGCTTGAGGACTCGCAAATCCAGCAGGAACAGAATGAAAAAGATTTATTAGCATT 
GGACAAGTGGCAAAATCTGTGGAGTTGGTTTAGCATAACAAATTGGCTATGGTATATAAAAATATTCATAATGATAGTA 
GGAGGCTTAATAGGTTTAAGAATAATTTTTGCTGTGCTATCTATAGTAAATAGAGTTAGGCAGGGATACTCACCTTTGT 
CGTTTCAGACCCTTACCCCAAACCCGAGGGGACCCGACAGGCTCGGAGAAATCGAAGAAGAAGGTGGAGAGCAAGACAG 
AGACAGATCCGTTCGATTAGTGAGCGGATTCTTACCACTTGCCTGGGACGATCTGCGGAGCCTGTGCCTCTTCAGCTAC 
CACCGATTGAGAGACTTCATATTCGATTGCAGCGAGGACAGTGGAACTTCTGGGACGCAGCAGTCTCAGGGGACTCCAG 
AGGGGTGGGAAGTCCTTAAATATCTGGGAAGCCTTGTGCAGTATTGGGGTCTGGAGCTAAAAAGAGTGCTATTAGTCTG 
CTTGATACCCATAGCAATAGCAGTAGCTGAAGGAACAGATAGGATTATTGAATTAGTACTAAGATTTTGTAGAGCTATC 
CGCAACATACCTACAAGAGTAAGACAGGGCTGTGAAGCAGCTTTGCTATAA 



SEQUENCE I.D. No 14: Dul7 9 Env Protein 



1 ANFLGKIWPS HKGRPGNFLQ SRPMRVRGIQ RNWPQWWIWG ILGFWMLMIC SGVGNLWVTI 

61 YYGVPVWREA KTTLFCAS DA KAYDREVHNV WATHACVPTD PNPQEIVMGN VTENFNMWKN 

121 DMVDQMHEDI INLWDQSLKP CVKLTPLCVT LKCSTYNGSD TNDMRNCSFN TTTEIRDKKQ 

181 TVYALFYKPD IVPINESEYI LIHCMTSTIT QACPKVSFDP IPIHYCAPAG YAILKCNNKT 

241 FNGTGPCQNV STVQCTHGIK PWSTQLLLN GSIAEGEIII RSENLTNNVK TIIVHLNESI 

301 GIVCTRPGNN TRKS1RIGPG QAFYTNHIIG DIRQAYCNIS KQEWNKTLEE VRKKLQEHFP 

3 61 NKTIKFNSSS GGDLEITTHS FNCRGEFFYC NTSKLFNDSL VNDTESNSTI TIPCRIKQII 

4 21 NMWQEVGRAM YAPPIAGNIT CKSNITGLLL TRDGGTDNTT EIFRPGGGNM KDNWRSELYK 
4 81 YKVVEIKPLG IAPTEAKRRV VEREKRAVGI GAVLLGFLGA AGSTMGAASI TLTVQARQLL 
541 SGIVQQQSNL LRAIEAQQHM LQLTVWGIKQ LQTRVLAIER YLKDQQLLGL WGCSGKLICT 
601 TNVPWNSSWS NKSQQAIWDN MTWMQWDREI NNYTNIIYQL LEDSQIQQEQ NEKDLLALDK 
661 WQNLWSWFSI TNWLWYIKIF IMIVGGLIGL RIIFAVLSIV WRVRQGYSPL SFQTLTPNPR 
721 GPDRLGEIEE EGGEQDRDRS VRLVSGFLPL AWDDLRSLCL FSYHRLRDFI FDCSEDSGTS 
781 GTQQSQGTPE GWEVLKYLGS LVQYWGLELK RVLLVCLIPI AIAVAEGTDR IIELVLRFCR 
841 AIRNIPTRVR QGCEAALL* 



6 



(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World Intellectual Property Organization 

International Bureau 




iiiiiiiiiniiiiiiii miiiiiiiiii miriiii'ii 



(43) International Publication Date (10) International Publication Number 

17 January 2002 (17.01.2002) PCT WO 02/004494 A3 



(51) International Patent Classification 7 : C07K 14/16, 

C12N 7/00, 7/02 

(21) International Application Number: PCT/IB0 1/0 1208 

(22) International Filing Date: 9 July 2001 (09.07.2001) 



(25) Filing Language: 

(26) Publication Language: 



English 
English 



(30) 



Priority Data: 

60/216,995 
2000/3437 
2000/4924 



7 July 2000(07.07.2000) US 
10 July 2000(10.07.2000) ZA 
15 September 2000 (15.09.2000) ZA 



(71) Applicants (for all designated States except US): MEDI- 
CAL RESEARCH COUNCIL [ZA/ZA]; Francie van Zijl 
Drive, Parow Valley, 7500 Cape Town (ZA). UNIVER- 
SITY OF CAPE TOWN [ZA/ZA]; Observatory, 7500 
Cape Town (ZA). UNIVERSITY OF NORTH CAR- 
OLINA AT CHAPEL HILL [US/US]; CB 4100, Bynum 
Hall, Chapel Hill, NC 27599-4100 (US). ALPHAVAX 
INCORPORATED [US/US]; 2 Triangle Drive, Research 
Triangle Park, NC 27709-0307 (US). 

(72) Inventors: and 

(75) Inventors/Applicants (for US only)'. WILLIAMSON, 
Carolyn | ZA/ZA |: University of Cape Town, Observatory, 
7500 Cape Town (ZA). SWANSTROM, Ronald, Ivar 
[USAISJ; University of North Carolina at Chapel Hill, CB 
4100 Bynum Hall, Chapel Hill, NC 27599-4100 (US). 
MORRIS, Lynn [ZA/ZA] ; National Institute for Virology, 
Modderfontein Road, 2131 Sandringham (ZA). KAR1M, 



Salim, Abdool [ZA/ZA]; Francie van Zijl Drive, Parow 
Valley, 7500 Cape Town (ZA). JOHNSTON, Robert, 
Edward [US/US]; University of North Carolina at Chapel 
Hill, CB 4100, Bynum Hall, Chapel Hill, NC 27599-4100 
(US). 

(74) Agents: CLELLAND, Sandra, Luischen et ah; Spoor 
and Fisher, P.O. Box 41312, 2024 Craighall (ZA). 

(81) Designated States (national): AE, AG, AL, AM, AT, AU, 
AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CO, CR, CU, 
CZ, DE, DK, DM, DZ, EC, EE, ES, PI, GB, GD, GE, GH, 
GM, HR, HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, 
LK, LR, LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, 
MX, MZ, NO, NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, 
SL, TJ, TM, TR, TT, TZ, UA, UG, US, UZ, VN, YU, ZA, 
ZW. 

(84) Designated States (regional): ARIPO patent (GH, GM, 
KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZW), Eurasian 
patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), European 
patent (AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, 
IT, LU, MC, NL, PT, SE, TR), OAPI patent (BF, BJ, CF, 
CG, CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG). 

Published: 

— with international search report 

(88) Date of publication of the international search report: 

13 March 2003 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



< 

(54) Title: PROCESS FOR THE SELECTION OF HIV-1 SUBTYPE C ISOLATES, SELECTED HIV-1 SUBTYPE ISOLATES, 
^ THEIR GENES AND MODIFICATIONS AND DERIVATIVES THEREOF 

^ (57) Abstract: The invention provides a process for the selection of HIV-l subtype (clade) C isolates, selected HIV-1 subtype C 
isolates, their genes and modifications and derivatives thereof for use in prophylactic and therapeutic vaccines to produce proteins and 

^ polypeptides for the purpose of eliciting protection against HTV infection or disease. The process for the selection of HIV subtype 

^ isolates comprises the steps of isolating viruses from recently infected subjects; generating a consensus sequence for at least part of at 
least one HTV gene by identifying the most common codon or amino acid among the isolated viruses; and selecting the isolated virus 

^ or viruses with a high sequence identity to the consensus sequence. HTV-1 subtype C isolates, designated Du422, Du 151 and Du 
179 (assigned Accession Numbers 010321 14, 00072724 and 00072725, respectively, by the European Collection of Cell Cultures) 

^ arc also provided. 



INTERNATIONAL SEARCH REPORT 



Inten al Application No 

PCT/IB 01/01208 



A. CLASSIFICATION OF SUBJECT MATTER , 

IPC 7 C07K14/16 C12N7/00 C12N7/02 



According to International Patent Classification (IPC) or to both national classification and IPC 



B. FIELDS SEARCHED 



Minimum documentation searched (classification system followed by classification symbols) 

IPC 7 C07K C12N 



Documentation searched other than minimum documentation to the extent that such documents are included In the fields searched 



Electronic data base consulted during Ihe International search (name of data base and. where practical, search terms used) 

EPO-Internal , EMBL, BIOSIS 



C. DOCUMENTS CONSIDERED TO BE RELEVANT 



Category • 


Citation of document, with Indication, where appropriate, ot the relevant passages 


Relevant to claim No. 


X 


DE BAAR M.P. ET AL. : "Subtype-specific 
sequence variation of the HIV type 1 long 
terminal repeat and primer-binding site" 
AIDS RES. AND HUMAN RETROVIR. , 
vol. 16, no. 5, 

20 March 2000 (2000-03-20), XP002221002 
the whole document 


1-4,27 


A 


TSCHERNING C. ET AL.: "Differences 1n 
chemoklne coreceptor usage between genetyc 
subtypes of HIV-1" 
VIROLOGY, 

vol. 241, 1998, pages 181-188, XP002221003 
the whole document 

-/-- 


1-7,9, 
11,13-30 



LH 



Further documents are listed In the continuation of box C. 



□ 



Patent family members are listed In annex. 



0 Special categories of cited documents : 

'A' document defining the general slate of the art which Is not 

considered to be of particular relevance 
*E' earlier document but published on or after the international 

filing date 

*L' document which may throw doubts on pdorily ciaim(s)or 
which is cited to establish the publication date of another 
citation or other special reason (as specified) 

•O' document referring to an oral disclosure, use, exhibition or 
other means 

'P document published prior to the International filing date but 
later than the priority date claimed 



*V later document published after the international filing date 
or priority date and not In conflict with the application but 
cited to understand Ihe principle or theory underlying the 
Invention 

'X* document of particular relevance; the claimed Invention 
cannot be considered novel or cannot be considered to 
involve an Inventive step when the document Is taken atone 

•V document ol particular relevance; the claimed invention 
cannot be considered to Involve an Inventive step when the 
document Is combined with one or more other such docu- 
ments, such combination being obvious to a person skilled 
in the art. 

'&' document member of the same patent family 



Date of the actual completion of the International search 

15 November 2002 


Date of mailing of the international search report 

02/12/2002 


Name and mailing address of the ISA 

European Patent Office, P.B. 5818 Patenttaan 2 
NL-2280HV Rljswljk 
Tel. (+31-70) 340-2040. Tx. 31 651 epo ni, 
Fax: (+31-70) 340-3016 


Authorized officer 

Gain, I 



Form PCT/1SA/210 (second sheet) (July 1992) 



page 1 of 4 



INTERNATIONAL SEARCH REPORT 



Inten 3\ Application No 

PCT/IB 01/01208 



C(Contlnuatton) DOCUMENTS CONSIDERED TO BE RELEVANT 


Category • 


Citation of document, with Indication, where appropriate, or the relevant passages 


Relevant to claim No. 


x 


NOVTTSKY VA FT AI • "Molecular rlnninn anri 


14-16 




r\hwl Anonof anal \/ci c a-P hitman 

pnyiogenetic analysis ot nunian 






i rninunoaeTi c i ency virus type i suDtype a 






Crtf C\ ■£ 91 fill 1 1 OUrtf h a! ArtflC ^M/\m 

set ot tu i i lengtn clones Trom 






DO t SWana 






.inilRNAI OF VTROI OGY TUP AMPRTPAN SOPTPTV 
UUUimNNL Ur V 1 IwLUU T , 1 nt. Arlur\J.V/nH oUUlLlT 






POP MTPRORTOI OGV IK 
rUi\ Ml ^txUDiULUuT , Uo, 






waI 71 aa R Maw 1 QQQ f 1 QQQ— OR^ nanae 

voi. /o, no. o, nay iyyy ^.lyyy-uoj, pages 






— AAIO YPO091 
HHL 1 HHoC | ArUUdlHHOOy 












-A DATARASP FMRI PROTPTN^ 'Onlinp! 






1 Mnvpmhpr 1 QQQ MQQQ-1 1-01 ^ 












y*ot*r*1 pupH "fr*nm PMRI 

I CU I CVcU 1 1 UNI ul'IDL 






Dataha^P appp<>sion no OQUIPQn 






Ypnn???innfi 






* Ot; 7*Z iHon+i + w wi-rh con 1 * 

* )?o . / /o identity witn seq. i * 






* Q/1 co/ iHnnt-i f\/ with con R * 

7 4 4.0a> identity witn seq. o * 






* QA ML •Mon+l + v with can 19 9k 

i 5 y4.o/© identity witn seq. ic * 






nATARAQP PMRI PPOTTT MQ ( rinl1nQl 
UMIftDMoc tnDL rlwItllMo Uniine! 






1 Mni/AmKov< 1QQQ / 1 QQQ 11 Ol N 

i iNovenfiDer iyyy ^iyyy-ii-ui; 






bag-poi poiyprotein 






retrieved from EMBL 






uataoase accession no. uywroy 






VDOOO001 007 






* aa 4C7 — 1110* QA R<¥ HHan-M + w u-ith can 7 

* aa 4o/-ii3y. oo.o/o identity witn seq. / 






faa 170— GRO^ sk 






—« uaimdMol triDL UNft Lm 11 Oe ! 






11 Mav»r>h 1 QQQ ( 1 000— 01—1 1 ^ 

ii riarcn iyyy ^lyyy-uo-n; 






M UT\/— 1 IcAlafa P — Q£RM0/1 09 n 

niv~i isolate U"-yoDWu*i. 






retrieved Troin criDL 






uataoase accession no. Mriiuyo^ 






YP009991 00Q 






* nt loo^-o/UJ. /^./b laentity witn seq. o 






/ n 4> /l/IO— 9C71 ^ * 












11 Marrh 1000 f* 1 000— 01-1 1 ^ 

ii riarcn iyyy ^iyyy uj—iiy 






"UITW — 1 1 caI afo P— QARIJ1 1 0^ r>r\i m-f v^\/ 

niV"i isolate u— yoDWii.u*f country 






Rrt4* ciifana n 

ootswana 






retrievea Trom lpidl 






uataoase accession no. Mriiuyoy 






YP009991 000 






± nt A1K9-Q1A1. QQ<V IHon+itw wi + h con 10 

* nt dioc oi^^i oy^ luentity witn seq. iu 






f*nt PQ1- 9I^70^ * 

vnt oyi co/)f) 191 






-& DATABASE EMBL DNA 'Online! 






-HIV-l Isolate C-96BW15B03 country 






Botswana" 






retrieved from EMBL 






Database accession no. AF110973 






XP002221010 






* nt 280-1760: 75% identity with seq. 4 






(nt 91-1571) * 






-/~ 





Form PCT/lSA/210 (continuation ot second sheet) (July 1092) 



page 2 of 4 



INTERNATIONAL SEARCH REPORT 



Inten il Application No 

PCT/IB 01/01208 



C.(Contlnuatlon) DOCUMENTS CONSIDERED TO BE RELEVANT 


Category 0 


Citation of document, with Indication .where appropriate, of the relevant passages 


Relevant to claim No. 


Y 


GAfi T FT Al ■ "Nlnl on 1 1 a** rl nnlnn anrl 
Unv r LI ML ♦ IIU 1 CL.U 1 dr L 1 UN 1 IILJ OHQ 


90 




oiiaiyoio ui i uiil. l i una i eiiveiupe yctico ti Ulll 






human immimnHof Iri onov/ \/iv*iic twnfi 1 

nufiian i ihiiiuiiuuct i l i eiiuy virus type i 






sequence suuiypes 8 tnrougn u. ine wnu ana 






iviMiu necvvorKS tot niv isolation ana 






r»har , ar*i , oirM 7atinn" 
unaiauuei iz.au iuii 






JOURNAI OF VTROLOGY THF AMFRTfAN SflPTFTY 






FOR MTrRORTni HfiY IIS 






vnl 7fl nn ^ Marrh 1QQ6 nQQfi-O^j 

VU 1 • /U, IIU. O, 1 iai 1*11 \JO J t 






oaaps 1651-1667 XP0u21?T*?l 












-& HATARASF FMBI PRQTFTNS 'On! ine 1 

01 Un 1 nDnO u LI IDL rrxuiLllw Will ill™; 






1 November 1996 (1996-11-01) 






"Envelope glycoprotein" 






re ur i eveu ti uin ci v idl 






Ha + aha ca arroccirtn nn n7fiP.1A 












* aa 91 Z—ZZA - SlKV IHantHu with con 1 * 

* aa Lio 001. oo« iQenti uy witn seq. j * 






* aa 91 G_7/Ift-i-/lfiQ-Kfi7 ahnn+ ft£^ 1 Honf i + u 

* ad clD <J t *o1' i iQy~DOu aDOUt OO/b Identity 






Wltn seq. !*♦ * 




v 

A 


1 HI F V Q FT Al « "Fill 1 -1 FWGTU UNMAN 
LULL N o LI ML« r ULL— LLIMu 1 n PlUrlnDI 






TMMIIMnnFFTrTFNPY VTP1K TYPF 1 GFNPjMFS FRfiM 
IrlrlUIMUUL r lUl Lliv T VllxUo ITrL 1 ULlMUIMLo r RUIM 






CIIRTVPF TMFFPTFn ^FPnmMVFPTCP^ TW 
OUDlTrL u— INrLulLU oLKUL-UN V LK 1 tt\C> IN 






TMHTA MTTU FVTnFMrF Cit TMTFPCI IRTVPT 
IIML/In, WI 1 n LViULlMuL Or 1IM 1 LKoUD 1 T rt 






RFrnMRTMATTHM" 
l\LL«UrlDllMM 1 1UIM 






.milRNAl OF VTR0I OGY THF AMFRTPAM SfiPIFTY 






rnp MTrpnRTni hgy irc 






vnl 77 nn 1 .lantiavM/ 1 QQQ MQQQ. n 1 ^ 

vol. /«3, no. i, January lyy? \ ivw ~ui^, 






nanoc 1Ci9-1Afi YPnfl9Q9Q97Q 






ISSN* 0022-538X 






-& DATABASE EMBL PROTEINS 'Online! 






1 November 1998 (1998-11-01) 






"envelope protein" 






retrieved from EMBL 






Database accession no. 090096 






XP002221012 






* 83% identity with seq. 9 * 






* 82% identity with seq. 11 (aa 24-857) 






-/-- 





Form PCT/ISA/210 (continuation ol second sheet) (July 1992) 



page 3 of 4 



Interr al Application No 



PCT/IB 01/01208 



C.(Contlnuatlon) DOCUMENTS CONSIDERED TO BE RELEVANT 


Category 0 1 


Cllallon of document, with indlcatlon.where appropriate, of the relevant passages 


Relevant to claim No. 


P,X 


LEIGH BROWN A.J. ET AL.: "Reduced 
susceptibility of human Immunodeficiency 
virus type 1 (HIV-1) from patients with 
primary HIV infection to nonnucleoside 
reverse transcriptase inhibitors is 
associated with variation at novel amino 
add sites." 
J. VIROL., 

vol. 74, no. 22, November 2000 (2000-11), 
pages 10269-10273, XP002221004 
-& DATABASE EHBL PROTEINS 'Online! 
1 June 2001 (2001-06-01) 
"Reverse transcriptase" 
retrieved from EMBL 
Database accession no. Q99FC3 
XP002221013 

* aa25-302: 96.7% Identity with seq. 2 * 

* aa 26-302: 97.5% identity with seq. 13 * 


18 


A 


DATABASE EMBL DNA 'Online! 
2 January 1996 (1996-01-02) 
"HIV-1 isolate BU/91/07, envelope" 
retrieved from EMBL 
Database accession no. HI1U39249 
XP002221014 

* nt420-2559: 70% identity with seq. 8 (nt 
402-2541) * 


13 


A 


VAN HARMELEN J.H. ET AL.: "A 

predominantly HIV Type 1 subtype 

C-restr1cted epidemic 1n South African 

urban populations" 

AIDS RES. AND HUMAN RETROVIR., 

vol. 15, no. 4, 1999, pages 395-398, 

XP002221005 





Form PCT/1SA/210 (continuation of second shoe!) (July 1992) 



page 4 of 4 



INTERNATIONAL SEARCH REPORT 



li itlonal application No. 
PCT/IB 01/01208 



Box I Observations where certain claims were found unsearchable (Continuation of item 1 of first sheet) 

This international Search Report has not been established In respect of certain claims under Article 17(2)(a) for the following reasons: 

1. Q Claims Nos.: 

because they relate to subject matter not required to be searched by this Authority, namely: 



2. PH Claims Nos.: 8,10,12,31 

because they relate to parts of the Internationa) Application that do not comply with the prescribed requirements to such 
an extent that no meaningful International Search can be carried out, specifically: 

see FURTHER INFORMATION sheet PCT/ISA/210 



3. Q Claims Nos.: 

because they are dependent claims and are not drafted In accordance with the second and third sentences of Rule 6.4(a). 

Box II Observations where unity of invention is lacking (Continuation of item 2 of first sheet) 

This International Searching Authority found multiple Inventions in this international application, as follows: 

see additional sheet 



1 . I I As all required additional search fees were timely paid by the applicant, this International Search Report covers all 

1 — 1 searchable claims. 



As all searchable claims could be searched without effort justifying an additional fee, this Authority did not Invite payment 
of any additional fee. 



3. I I As only some of the required additional search fees were timely paid by the applicant, this International Search Report 
1 — » covers only those claims tor which fees were paid, specifically claims Nos.: 



4 - EH No re Q uir sd additional search fees were timely paid by the applicant. Consequently, this International Search Report Is 
restricted to the Invention first mentioned In the claims; It Is covered by claims Nos.: 



Remark on Protest Q The additional search fees were accompanied by the applicant's protest. 

No protest accompanied the payment of additional search fees. 



Form PCT/ISA/210 (continuation of first sheet (1)) (July 1998) 



International Application No. PCT/lB 01 £1208 

FURTHER INFORMATION CONTINUED FROM PCT/ISA/ 210 

This International Searching Authority found multiple (groups of) 
inventions in this international application, as follows: 

1. Claims: 1-4 

A process for the selection of HIV subtype Isolates. 

2. Claims: 5,8,9,16,17,28,31 and partly 24-26 

An HIV-1 subtype C, designated Du422 and its gag,pol,env 
sequences and consensus sequences. 

3. Claims: 6,10,11,12,13,18,19,20,21,29 and partly 24-26 
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